# Investigating helicopter data

In [None]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

## Loading the data
To load the data from a `.csv` file into a Table, you can use the `read_table` function as shown below. Run the code below to store the entire dataset into a Table named `helicopter`.

In [None]:
helicopter = Table.read_table("data/helicoptes2024.csv")

## Investigating the data

You will likely want to learn a bit of summary information regarding your data. There are several commands to help you do this quickly.


### Inspecting the data
You can start by looking at the first 10 rows of the Table just by running a code well with the Table's name.

### Size of your data
You can determine the dimensions of your data using the `.num_rows` and `.num_columns` commands.

### Summary statistics

You may wish to calculate some summary statistics on your data. You can use the `numpy` library which contains many statistical functions. First, select the function you wish to use, then, select column you wish to use in your computation using the `.column()` command. 

Common `numpy` statistics functions are:

* **Arithmetic Mean**: `np.mean` / `np.nanmean`
* **Median**: `np.median` / `np.nanmedian`
* **Standard Deviation**: `np.std` / `np.nanstd`
* **Variance**: `np.var` / `np.nanvar`

### Grouping your data

You'll often want to perform calculations on a particular subgroupd of your dataset. You can use the `.group` function to help you perform such a task.

The `.group` function takes as its argument the label of the column that contains the categories. By default it returns a table of counts of rows in each category.

## Filtering the data

You may what to only use part of your dataset at a time. You can use `where` functions to specify how to filter down to just the rows/observations you're interested in using. **Note:** This is not modifying the original table at all, it just creates a new table that contains the requested rows.

Common actions used in filtering are:

* `are.equal_to`
* `are.not_equal_to(x)`
* `are.above(x)`
* `are.above_or_equal_to(x)`
* `are.below(x)`
* `are.below_or_equal_to(x)`
* `are.between(x, y)`
* `are.between_or_equal_to(x,y)`

If you want to save the result of a filter, you need to assign it a new name. It is a best practice not to overwrite any tables in a notebook, but instead create a new table with a new name to store any filtered or otherwise modified data that you intend to use again later in the notebook.

The code below will create a new Table `good_helicopter` that only contains the observations with numerical times and were not labeled as containing an anomaly in the flight.

In [None]:
good_helicopter = helicopter.where('Time', are.not_equal_to("nan")).where('Anomaly', are.equal_to("No"))

## Visualizing the data

You can easily create bar charts, scatter plots, line plots, and histograms depending on the data you are hoping to visualize.

### Bar charts

To create a bar chart you need a Table that contains a frequency count of categorical varibles. We can use the `.group` function to create such a table, and then generate the bar chart using the `bar()` function. In both cases, be sure to specify which column you are wishing to group and visualize the frequency. **Note:** When creating frequency bar charts, its usually best to sort the data before creating the chart. The code below wil use the `.sort` function to sort by the frequency in descending order before creating the bar chart.

### Histograms

To investigate a distribution of numerical values, use a histograms! The `.hist()` function can handle this quickly. Specify the numerical column you wish to create a histogram from.