# Dataframe

- Used for working with tabular data
- Start with: `using DataFrames`

# Creating Dataframe

- Use columns in entries

```
df = DataFrame(
col_1 = [1,2,3,4,5],
col_2 = ['A','B','C','D','E'],
col_3 = [true, false,true,true,false],
)
```

# Load CSV files

```
using CSV
# Load the run data
file = CSV.File("file_name.csv")
# Convert the CSV file into the DataFrame
df = DataFrame(file)
```

# Inspect Top Rows

```
println(first(df, 3))
```

# See column names

`println(names(df))`

# See tabular data size

`println(size(df))`

# Extract Dataframe elements

- Extract single value : `df[row_num, col_num]`
- Extract multipe row values : `df[row_num_start : row_num_end, col_num]`
- Extract multipe row and multiple column values : 
    - `df[row_num_start : row_num_end, col_num_start, col_num_end]`
- Extract all rows of a column : 
    - `df[ : , col_num]`
    - `df[ : , 'col_name']`
    - `df.col_name`
- Extract specific row value of a column:
    - `df.col_name[row_num]`
    - `df[row_num, col_num]`
    - `df[row_num, "col_name"]`


# Sorting Dataframe

- Ascending : `df_sort = sort(df, "col_name")`
- Descending : `df_sort = sort(df, "col_name", rev= true)`


# See descriptive statistics

- `println(describe(df))`
- `using statistics`
    - `mean()` - Calculate mean of array
    - `median()` - Calculate median value of array
    - `std()` - Calculate standard deviation of array values
    - `var()` - Calculate variance of array values
    - `sum()` - Calculate sum of array
    - `minimum()` - Calculate minimum value in array
    - `maximum()` - Calculate maximum value in array
    - example: `total = sum(df[:, "col_name"])`
    

# Column Operations

<center><img src="images/04.09.jpg"  style="width: 400px, height: 300px;"/></center>


# Creating new column

- `df[:, "col_name" ] = col_A ./ col_B`
- `df.col_name = col_A ./ col_B`

# Filtering

- Row-wise operation in dataframe
- `df_filtered = filter(row -> row.col_name<=3000, df)`
