# Objective

Review core `pandas` objectsL `pandas.Series` and `pandas.DataFrame`

# `pandas`
- Python package to wrangle and anayze tabular data 
- built on top of Numpy
- core tool for data analysis in Python

In [2]:
import pandas as pd

import numpy as np

# Series

A `pandas.Seroes`: 

- is one of the core data structures in `pandas`
- a 1-dimensional array of *indexed* data
- will be the columns of the `pandas.DataFrame`

# Creating a pandas Series

Several ways of creating a pandas Series.
For now, we will create a series using: 
```
s = pd.Series(data,index = index)
```
- `data` = numpy array (or a list ofobjects that can be converted to Numpy types)
- `index` = a list of indices of the same length as data 


In [6]:
# Ex: a pandas series from a numpy array
# np.arrange() function constructs an array of consecitive integers
np.arange(3)

array([0, 1, 2])

In [7]:
# We can use this to create a pandas Series
pd.Series(np.arange(3), index = ['a','b','c'])

a    0
b    1
c    2
dtype: int64

What kind of parameter is `index`?

A: an optional parameter, there is a default value to it.
If we don't specify `index`, the default is to start the index with 0.
Example:


In [8]:
# create a series from a list of strings with default index
pd.Series(['EDS220','EDS222','EDS223','EDS242'])

0    EDS220
1    EDS222
2    EDS223
3    EDS242
dtype: object

# Operations of Series

Arithmetic operators work on series on also most Numpy functions.

Example:

In [12]:
# define a series
s = pd.Series([98,73,65], index = ['Andrea','FA','FT'])
print(s,'\n')
#'\n' is used to add a line the next code will start with a gap btw them and won't look stuffy

# divide each element in the series by 10:
print(s/10)

Andrea    98
FA        73
FT        65
dtype: int64 

Andrea    9.8
FA        7.3
FT        6.5
dtype: float64


Example: create a new series with `True/False` values indicating whether 
the elements in the series satisfy a condition or not


In [13]:
s > 70

Andrea     True
FA         True
FT        False
dtype: bool

This is simple -- but important!! Using conditions on Series 
is key to select data from dataframes.

## Attributes & Methods
Two examples about identifying missing values.

- missing values in `pandas` are represented by `np.Nan` = not a number
- `NaN` is a type of float in numpy

In [16]:
np.NaN

nan

In [17]:
type(np.NaN)

float

In [19]:
#series with NAs in it:
s = pd.Series([1,2, np.NaN, 4, np.NaN])
s

0    1.0
1    2.0
2    NaN
3    4.0
4    NaN
dtype: float64

hasnans = attribute of pandas series, returns `True if there any NAs:

In [20]:
# check is series has NAs
s.hasnans

True

` isna()` = a method of series, returns a series indicating 
which elements are NAs:

In [21]:
s.isna()

0    False
1    False
2     True
3    False
4     True
dtype: bool

`bool`: `True`or `False`

# Dataframes

`pandas.DataFrame`:

- most used object in `pandas`
- represents tabular data (think of spreadhseet)
- each column is a `pandas.Series`

# Creating a `pandas.DataFrame`

*Many ways of creating a dataFrame. Let's see one.
Dictionaries: They are sets of key-value pairs:
```
{ key1 : value1,
  key2 : value2,
  key3 : value3
}
```

Think of a `pandas.DataFrame` as a dictionary where the 
keys  = column names
values = column values

We can create a dataframe like this

In [25]:
# initialize dictionary with columns' data
d = {'col_name_1' : np.arange(3),
     'col_name_2' : [3.1, 3.2, 3.3]
    }
d

{'col_name_1': array([0, 1, 2]), 'col_name_2': [3.1, 3.2, 3.3]}

In [26]:
df = pd.DataFrame(d)
df

Unnamed: 0,col_name_1,col_name_2
0,0,3.1
1,1,3.2
2,2,3.3


# In-place operations
Lets rename the data frame's columns
We can use the dataFrame method called `rename`
`rename` taked in as an input a dictionary:

```
{'col_1_old_name' : 'col_1_new_name',
 'col_2_old_name' : 'col_2_new_name'
    }



In [34]:
# define a new column names
col_names = { 'col_name_1': 'col1',
              'col_name_2': 'col2'
            }

#reanme using rename
df.rename(columns = col_names)

Unnamed: 0,col1,col2
0,0,3.1
1,1,3.2
2,2,3.3


In [32]:
# Take a look at our dataframe
df

Unnamed: 0,col_name_1,col_name_2
0,0,3.1
1,1,3.2
2,2,3.3


In [None]:
Nothing Changed!
`df.rename` doesn't change the column names in place, 
meaning it doeesn't modify the object itself. Instead, it creates a new object as an output

Assign output back to dataframe

In [33]:
df = df.rename(columns = col_names)
df

Unnamed: 0,col1,col2
0,0,3.1
1,1,3.2
2,2,3.3
