# Pandas

Purpose: This notebook will introduce you to Pandas, the primary Python library used to work with data sets

## Data frame from lists

In [None]:
import pandas as pd
name = ["Kirk", "Spock", "McCoy"]
age = [34, 35, 40]
rank = ['Captain','Science Officer','Chief Medical Officer']
ship = ['Intrepid','Enterprise', 'Enterprise']

df = pd.DataFrame(columns =['Name','Age','Rank','Ship'])
print(df)
print('')

df['Name'] = name
print(df)
print('')

df['Age'] = age
df['Rank'] = rank
df['Ship'] = ship
print(df)

## Indexing

Indexing can be done like for dictionaries

In [None]:
df["Name"]

In [None]:
df[["Name","Ship"]]

You can't directly index with numbers

In [None]:
df[0]

## loc

The .loc indexer selects data in a different way than just the indexing operator. It can select subsets of rows or columns. It can also simultaneously select subsets of rows and columns. Most importantly, it only selects data by the INDEX of the rows and columns.

In [None]:
print(df.loc[0]) #Returns the first row
print('')
print(df.loc[[0,1]]) #Returns the first and second rows

In [None]:
df = pd.DataFrame(age, index =['Kirk', 'Spock', 'McCoy'],
                columns =['Age'])
print(df)

In [None]:
print(df.loc['Kirk'])

In [None]:
print(df.loc['Kirk':'Spock'])

In [None]:
dict = {
    'people': ["Kirk", "Spock", "McCoy"],
    'ages': [34, 35, 40],
    'rank': ['Captain','Science Officer','Chief Medical Officer'],
    'ship': ['Intrepid','Enterprise', 'Enterprise']

}

print(dict)
print('---')
df = pd.DataFrame(dict)
print(df)

In [None]:
print(df.loc[0:2,["people","rank"]])

In [None]:
print(df.loc[0:1,'people':"rank"])

## iloc

In [None]:
print(df)

In [None]:
print(df.iloc[0,0]) #Returns index 0 of the column and row

In [None]:
print(df.iloc[0:2,0:2]) #Returns index 0 of the column and row

## Sort

In [None]:
df.sort_values("rank")

In [None]:
df.sort_values("ages")

## Filter

Pandas can filter data quickly and efficiently 

In [None]:
print(df)

In [None]:
filt = df['ship'] == 'Enterprise' #Filter by those who are on the Enterprise
print(filt) #Filt is a dataframe of booleans

In [None]:
df.loc[filt] #Using loc on the filter will give the dataframe that satisfies the filter

We can also double filter by ship and age

In [None]:
filt2 = df['ages'] < 40
print(filt2)

In [None]:
df.loc[filt & filt2]

Create a new column for the satisfied condition

In [None]:
df.loc[filt & filt2, 'Promotion'] = 'Eligible'
print(df)

# Cleaning Data

Now we have cells that are NaN ... what do we do?

We can use the dropna to remove them

In [None]:
new_df = df.dropna()
print(new_df)

dropna doesn't work on an existing dtaframe unless inplace = True

In [None]:
df.dropna()
print(df)

Or we can set them to a preferred default value and use inplace = True

In [None]:
df.fillna('Not Eligible', inplace = True)
print(df)

# Reading Data Frame from CSV/Excel

In [None]:
df2 = pd.read_csv('./Data/OrbitData.csv')
print(df) 

In [None]:
df2.info()

In [None]:
df2.head()

In [None]:
df2.tail()

In [None]:
cols = []
for col in df2.columns:
    cols.append(col)

In [None]:
print(cols[-2])
df2[cols[-2]] > 200 #Altitude > 190

In [None]:
df2[df2[cols[-2]] > 200]

# Plotting

In [None]:
import matplotlib.pyplot as plt
plt.plot(df2[cols[0]],df2[cols[-2]],'.') #Plot No. vs Altitude

Indexing and slicing works like in lists

# Plotting filtered data

In [None]:
filtdf = df2[df2[cols[-2]] > 200]
plt.plot(filtdf[cols[0]],filtdf[cols[-2]],'.') #Plot No. vs Altitude >190

In [None]:
filtdf2 = df2[df2[cols[0]] < 14200]
plt.plot(filtdf2[cols[0]],filtdf2[cols[-2]],'.') #Plot No. vs Altitude >190

## Exercise

1 - Filter df2 (orbit data) for orbits in April.  Plot Orbit Number vs Altitude

2 - Filter df2 (orbit data) for orbits with SC Lon between 90 and 180 degrees.  Plot Orbit Number vs Altitude 

3 - Combine the Ev, ent, UTC, PERI columns into datetimes:
datetime.datetime(year, month, day, hour=, minute, second)
Insert these datetimes as a column into the dataframe df2