# Objective

In this lesson we cover the two core objects in the pandas library, the pandas.Series and the pandas.DataFrame. We will also learn methods to select data from our datasets.

## Pandas
- Python package to wrangle and analyze tabular data  
- built on top of NumPy 
- the core tool for doing data analysis in Python.

In [2]:
import pandas as pd

# we will also import numpy 
import numpy as np

# Series
A pandas.series:
    - is one of the core datat structures in 'pandas'
    - a 1-dim array of indexed data
    - will be the columns of the pandas.DataFrame
    
## Creating a Pandas Series
Several ways of creating a pandas Series:
```
s = pd.Series(data, index=index)
```

- data = numpy array (or a list of objects that can be converter to NumPy types)
- index = a list of indices of same length as data


In [4]:
#EX: a pandas series from a numpy array

#np.arrange() function constructs an array of consecutive integers
np.arange(3)

array([0, 1, 2])

In [5]:
# We can use this to create a pandas series
pd.Series(np.arange(3), index = ['a', 'b', 'c'])

a    0
b    1
c    2
dtype: int64

In [None]:
#What kind of parameter is index?
    #Answer: Optional Parameter! inside the series function, the defoult for 
    #index is to start at zero but you can overwrite it to start at a different value

In [6]:
# Create a series from a list of strings with default index
pd.Series(['EDS220', 'EDS222', "EDS223", "EDS242"])

0    EDS220
1    EDS222
2    EDS223
3    EDS242
dtype: object

# Operations and Series
Arithmetic operations work on series and also most NumPy functions. For example:

In [10]:
# define a series
s = pd.Series([98,73,65],index=['Andrea', 'Beth', 'Carolina'])
print(s, '\n')

# divide each element in series by 10
print(s /10, '\n')

# take the exponential of each element in series
print(np.exp(s), '\n')

# notice this doesn't change the values of our series
print(s)

Andrea      98
Beth        73
Carolina    65
dtype: int64 

Andrea      9.8
Beth        7.3
Carolina    6.5
dtype: float64 

Andrea      3.637971e+42
Beth        5.052394e+31
Carolina    1.694889e+28
dtype: float64 

Andrea      98
Beth        73
Carolina    65
dtype: int64


In [11]:
#Create a True/False list
s>70

Andrea       True
Beth         True
Carolina    False
dtype: bool

## Attributes and Methods

pandas.Series have many attributes and methods, you can see a full list in the pandas documentation. For now we will cover two examples that have to do with identifying missing values.

pandas represents a missing or NA value with NaN, which stands for not a number. Let’s construct a small series with some NA values:

In [15]:
# series with NAs in it
s = pd.Series([1, 2, np.NaN, 4, np.NaN])
print(s, '\n')

# check if series has NAs
print(s.hasnans, '\n')

#which elements in the series are NAs
s.isna()

0    1.0
1    2.0
2    NaN
3    4.0
4    NaN
dtype: float64 

True 



0    False
1    False
2     True
3    False
4     True
dtype: bool

# Data Frames
We already mentioned each column of a pandas.DataFrame is a pandas.Series. In fact, the pandas.DataFrame is a dictionary of pandas.Series, with each column name being the key and the column values being the key’s value. 
- key: column names
- values: column values
Thus, we can create a pandas.DataFrame in this way:

In [20]:
# initialize dictionary with columns' data 
d = {'index' : pd.Series(np.arange(3)),
     'prob#' : pd.Series([3.1, 3.2, 3.3]),
     'title' : pd.Series(['Problem3 Example1', 'Problem3 Example2', 'Problem3 Example3'])
     }

# create data frame
df = pd.DataFrame(d)

print(df.index)

# change the index
df.index = ['a','b','c']
df

RangeIndex(start=0, stop=3, step=1)


Unnamed: 0,index,prob#,title
a,0,3.1,Problem3 Example1
b,1,3.2,Problem3 Example2
c,2,3.3,Problem3 Example3


# in-place operations
rename is an example of an inplace operations. in place operations dont change the actual dataset becuase tehy're not being stored but you can output the changes