# Pandas

Last time we learned about Pandas and how we can use it to work with data. Let's recap some of the key points.

# Install

To use Pandas, we need to install it. Often it will already be installed on your system, but if not, you use this command:

In [None]:
!pip install pandas

# Import

Once installed, we just need to import it so we can use it in our scripts:

In [None]:
import pandas as pd

# Create a dataframe

In Pandas, we hold data in dataframes. These are basically tables. 

There are many ways to get data into a dataframe. We looked at two common ways.

We can import CSVs:

In [None]:
file_path = "../workshop_4/pandas_data.csv"
data_df = pd.read_csv(file_path)
data_df

And we can create them manually:

In [None]:
df = pd.DataFrame({
    "numbers": [1, 2, 3],
    "letters": ["A", "B", "C"],
    "fruits": ["apples", "bananas", "cherries"]
})
df

# Manipulating Data

Once you create a dataframe, there are many different commands you can use to manipulate the data. Here are a few we talked about:

## Add a column

Use the square brackets syntax:

In [None]:
df["vegetables"] = ["artichokes", "brussel sprouts", "cabbages"]
df

## Remove a column

In [None]:
df = df.drop("numbers", axis=1) # axis is 0 (rows) by default
df

## Sort by a column

In [None]:
df = df.sort_values("vegetables", ascending=False) # ascending is True by default
df

## Filter a dataframe

In [None]:
df_only_cherries = df[df["fruits"] == "cherries"]
df_only_cherries

# Grouping

Below, we are grouping on the "fruits" column, to find out how many instances there are of each value. We then reset the index and rename the count column.

In [None]:
df_grouped_data = df.groupby(["fruits"]).size().reset_index(name="Count")
df_grouped_data

# Making charts

You can use the `.plot()` method to create charts. There are many different kinds of chart you can make, here's a simple bar chart.

In [None]:
df_grouped_data.plot(x='fruits', y='Count', kind='bar', title='Fruit counts')