# Objective

review core `pandas` objects" `pandas.Series` and `pandas.DataFrame`

#`pandas`
- python package to wrangle and analyze tabular data
- built on top of NumPy
- core tool for data analysis in python

In [1]:
# import pandas with standard abbreviation
import pandas as pd

# import numpy too!
import numpy as np


# Series

A `pandas.Series`:
- is one of the core data structure in `pandas`
- a 1-dimensional array of *indexed data
- will be the columns of the `pandas.DataFrame`


# Creating a pandas Series

Several ways to creating a pandas Series. 
for now, we will create series using:
```
s = pd.Series(data, index = index)
```
`data` = numpy array (or a list of onjects that can be converted to NumPy types)
`index` = list of indices of same length as data

In [3]:
# EX: a pandas series from a numpy array

# np.arrange() function constructs an array of consecutive intergers
np.arange(3)

array([0, 1, 2])

In [4]:
# we can use this to create a pandas Series
pd.Series(np.arange(3), index = ['a', 'b', 'c'])

a    0
b    1
c    2
dtype: int64

What kind of parameter is `index`?

A: an optional parameter, there is a default value to it

if we dont specify `index`, the default is to start the index from 0

example:

In [5]:
# create a series from a list of strings with default inde

pd.Series( ['EDS 220', 'EDS 221', 'EDS 222'])

0    EDS 220
1    EDS 221
2    EDS 222
dtype: object

# Operation of series
arithmetic operations work on series on alse most numpy functions

example:

In [6]:
s = pd.Series([98, 73, 65], index = ['Andrea', 'Beth', 'Carolina'])
print(s, '\n')
#divide each element in the series by 10
print(s/10)

Andrea      98
Beth        73
Carolina    65
dtype: int64 

Andrea      9.8
Beth        7.3
Carolina    6.5
dtype: float64


In [7]:
s>70

Andrea       True
Beth         True
Carolina    False
dtype: bool

This is simple -- but important! Using conditions on Series is key to select data from dataframes

# Attributes and Methods

two examples about identifying missing values

- missing values i pandas are represented by `np.NaN` = not a number
- `NaN` is a type of float in numpy

In [8]:
np.NaN

nan

In [9]:
type(np.NaN)

float

In [11]:
# series with NAs in it

s = pd.Series([1,2,np.NaN, 4, np.NaN])
s

0    1.0
1    2.0
2    NaN
3    4.0
4    NaN
dtype: float64

In [None]:
#check if series has nan
s.hasnans

True

`isnna()` = a method of series, returns a series indicating which elements are NAs


In [13]:
s.isna()

0    False
1    False
2     True
3    False
4     True
dtype: bool

`bool` : `True` or `False` are **boolean values**

# Dataframes

`pandas.DataFrame`

- most used object in `pandas`
- represents tabular data (think of spreadsheets)
- each column is 'pandas.Series'

# Creating a `pandas.DataFrame`

*Many ways if creating a dataframe*, lets see which ones.

remember dictionaries? they are sets of key-value pairs

```
{
key1: value1,
key2: value2
}
```

think of a pd.DataFrame as a dictionary where:
- key = column names
- values = column values

we can create a dataframes like this:

In [17]:
# initialize dictionary with columns data
d = {
    'col_name_1': np.arange(3),
    'col_name_2': [3.1, 3.2, 3.3]
    }
d

{'col_name_1': array([0, 1, 2]), 'col_name_2': [3.1, 3.2, 3.3]}

In [26]:
# create data frame
df = pd.DataFrame(d, index = ['a', 'b', 'c'])
df

Unnamed: 0,col_name_1,col_name_2
a,0,3.1
b,1,3.2
c,2,3.3


# In-place operations
lets rename the data frame's columns
we can use a dataframe *method* called `rename`
`rename` takes in as an input a dictionary

```
{
'col_1_old_name': 'col_2_new_name',
'col_1_old_name': 'col_2_new_name'
}
```

In [31]:
# define new column name
col_names = {
'col_1_old_name': 'col1',
'col_2_old_name': 'col2'
}

#rename  using rename

df.rename(columns = col_names)

Unnamed: 0,col_name_1,col_name_2
a,0,3.1
b,1,3.2
c,2,3.3


In [None]:
# i think i did an error, the column name should have changed

In [25]:
df

Unnamed: 0,col_name_1,col_name_2
a,0,3.1
b,1,3.2
c,2,3.3


nothin changed! `df.rename` doesnt change the column name *in place*, meanign it doesnt modify the object itself. instead, it created a new object as an output.

assign output back to dataframe to actually change it

In [None]:

df.rename(columns = col_name, inplace = True)