# Introduction to Pandas

Pandas for data analysis. Pandas can be considered as an extremely powerful version of Excel, plus a lot more features. Topics:

* Introduction to Pandas
* Series
* DataFrames
* Missing Data
* GroupBy
* Merging, Joining, and Concatenating
* Operations
* Data Input and Output
___
___

# Series
A Series object:
* Similar to a NumPy array (since it is built from NumPy array object).
* In contrast to a Numpy object, a series can have axis labels (i.e. indexed by a label) instead of a number location. 
* It can hold any arbitrary Python Object in addition to numeric data.

Examples:

In [3]:
import numpy as np
import pandas as pd

### Creating a Series
Conversion of a list, numpy array, or dictionary to a Series:

In [4]:
# Here are four different python objects:
lab = ['a','b','c']
lis = [10,20,30]
arr = np.array([11,22,33])
dic = {'a':10,'b':20,'c':30}

**List to Series**

In [5]:
# Pass data from a list to a series
pd.Series(data=lis)

0    10
1    20
2    30
dtype: int64

In [4]:
pd.Series(lis)

0    10
1    20
2    30
dtype: int64

In [9]:
# Use lab to specify index
pd.Series(data=lis,index=lab)

a    10
b    20
c    30
dtype: int64

In [10]:
# An easier way to code the above line
pd.Series(lis,lab)

a    10
b    20
c    30
dtype: int64

**NumPy Arrays (npa) to Series**

In [15]:
# Pass data from a npa to a series
pd.Series(arr)

0    11
1    22
2    33
dtype: int32

In [16]:
pd.Series(arr,lab)

a    11
b    22
c    33
dtype: int32

**Dictionary to Series**

In [19]:
dic

{'a': 10, 'b': 20, 'c': 30}

In [18]:
pd.Series(dic)

a    10
b    20
c    30
dtype: int64

### Data in a Series

**A pandas Series can hold a variety of data object types:**

In [20]:
# Example: Holding labels as data object
pd.Series(data=lab)

0    a
1    b
2    c
dtype: object

In [21]:
# Including passing built-in functions (Rarely used)
pd.Series([sum,print,len])

0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

___
## Using an Index

* Index is the key to understand the usage of a Series.
* Pandas makes use of index names or numbers by allowing for fast search of information (similar to a hash table or dictionary).

Example: Retrieving information from a Series.

In [23]:
# Datapoints are [1,2,3,4] and indexes are the country names
s1 = pd.Series([1,2,3,4],['USA', 'Germany','Korea', 'Japan'])
s1

USA        1
Germany    2
Korea      3
Japan      4
dtype: int64

In [24]:
# Datapoints are [6,2,5,4] and indexes are another set of country names country names
s2 = pd.Series([6,2,5,4],index = ['Austria', 'Germany','Italy', 'Japan'])
s2

Austria    6
Germany    2
Italy      5
Japan      4
dtype: int64

In [25]:
# To find the datapoint by retrieving via index(label)
s1['Korea']

3

Operations are then also done based off of index:

In [26]:
# Returns NaN for datapoints that cannot be found 
s1 + s2

Austria    NaN
Germany    4.0
Italy      NaN
Japan      8.0
Korea      NaN
USA        NaN
dtype: float64