# Intro

## Welcome to our Pandas Workshop!

Pandas is a Python library that is great for working with datasets of many kinds. In this workshop, you'll learn how to set up a pandas DataFrame and some essential functions to use on your data. 

## What are we covering today?

First we'll discuss some pandas concepts:

- DataFrames
- Series \(and the `index` and `values` of a Series\)
- Arrays \(numpy array\)

Then we'll work through a standard data workflow:

- Reading in our data and cleaning it
- Looking at our DataFrame
- Selecting a particular column or columns
- Summarizing data
- Restructuring data
- Plotting data
- Exporting data



# Using Jupyter Notebooks

General: 

- Run cell: Shift \+ Enter
- Activate command mode: ESC

<span style='color:#42a5f5'>Command mode</span>: \(Hit escape to stop editing a cell and enter this mode\)

- B: insert cell below

<span style='color:#66bb69'>Edit Mode</span>:

- Tab: code completion/indent  



# Importing Data

Let's get some data to play with!



In [1]:
# type your code in the provided cells!

In [0]:
# use this space to import pandas


## A Realistic Depiction of Getting Data into Python

Exciting! We have some fresh new cyclic voltammetry data to analyze. Fortuitously, `pandas` has a function called `read_csv` design for loading tabular data. Let's do it!



In [0]:
# try reading in this file:
"https://raw.githubusercontent.com/orioncohen/metal-bands-by-nation/main/cyclic_voltammetry_output.txt"

In [0]:
# attempt 2 at reading in the file


In [0]:
# Make a data frame, here are the files we are using
"https://raw.githubusercontent.com/orioncohen/metal-bands-by-nation/main/world_population_1960_2015.csv"
"https://raw.githubusercontent.com/orioncohen/metal-bands-by-nation/main/new_bands.csv"

In [5]:
print("your code here")

your code here


# How is our data structured?



In [2]:
# let's look at the first few rows of the bands data frame


In [0]:
# let's look at the last few rows of the bands data frame


In [0]:
# how do we see a summary of what's contained in our table?


In [0]:
# what do the first few rows of the world population data frame look like?


In [0]:
# How do we just look at the 'country' column of the world ppulation data frame? 


In [0]:
# What if we want to look at a few columns? Try looking at the 'genre' and 'theme' columns of the bands data frame


In [0]:
# How many times does each country appear in the data frame?


In [0]:
# When did most metal bands form?


# So what is a DataFrame?



In [1]:
# what happens if we try to use Python's type function on a pandas data frame?


In [0]:
# you can access elements/columns of a data frame just like an array, try it out!!


In [0]:
# What's the type of an individual column of a data frame? 


In [0]:
# are the indices of an individual column and the entire data frame the same?


# Cleaning the data

What happens if there are no values in some of our cells? This can be a nuisance when trying to do certain types of analysis. Let's start by removing any rows with missing values in our bands DataFrame. There is also a function, <span style='font-family:courier new'>fillna\(\)</span><span style='font-family:helvetica'>that can be used to fill null values with the information provided in the function call. </span>


In [2]:
# Looking back at our bands data frame it looks like there's a null value in the first row, let's get rid of it.


### <span style='color:#020202'>What is the average age of a metal band?</span>



What values do we need? Are they already in the table? If not, how can we get them? 


<span style='font-size:x-large'>Creating a column</span>



<span style='font-size:x-large'>Some statistical analysis</span>


In [3]:
# try taking the mean, median, standard deviation, and variance of the data in the column we just created.


## <span style='font-size:xx-large'>How many metal bands does each country have per capita in 2015? </span>

# <span style='font-size:x-large'>Combining the bands and population data</span>



Think about the data we want to have from our bands data frame and how best to obtain it.


How do we get the specific countries out of our world population data frame to add population information to our bands dataframe? We need to use the merge function!!



What about the countries with the fewest metal bands per capita? 



## <span style='font-size:xx-large'>How many metal bands formed in each year? </span>

<span style='font-size:x-large'>An extra value\_counts exercise. Challenge: do it in one line!</span>



# Exporting Data



In [0]:
# export cv data and metal DataFrame


# Summary



- Reading in our data and cleaning it
  - read\_csv\(\)
  - dropna\(\)
  - astype\(\)
- Looking at our DataFrame
  - head\(\)
  - tail\(\)  
  - info\(\)
- Selecting a particular column or columns
- Summarizing data
  - value\_counts\(\)
  - mean\(\)
  - median\(\)
  - mode\(\)
- Restructuring data
  - merge\(\)
  - rename\_axis\(\)
  - reset\_index\(\)
- Plotting data
  - plot\(\)
- Exporting data

What we didn't cover:

- groupby\(\)
- MultiIndex
- slicing  

