In [1]:
import pandas as pd

<p style="font-family: Arial; font-size:3.75em;color:purple; font-style:bold"><br>
Pandas</p><br>

*pandas* is a Python library for data analysis. It offers a number of data exploration, cleaning and transformation operations that are critical in working with data in Python. 

*pandas* build upon *numpy* and *scipy* providing easy-to-use data structures and data manipulation functions with integrated indexing.

The main data structures *pandas* provides are *Series* and *DataFrames*. After a brief introduction to these two data structures and data ingestion, the key features of *pandas* this notebook covers are:
* Generating descriptive statistics on data
* Data cleaning using built in pandas functions
* Frequent data operations for subsetting, filtering, insertion, deletion and aggregation of data
* Merging multiple datasets using dataframes
* Working with timestamps and time-series data

**Additional Recommended Resources:**
* *pandas* Documentation: http://pandas.pydata.org/pandas-docs/stable/
* *Python for Data Analysis* by Wes McKinney
* *Python Data Science Handbook* by Jake VanderPlas

Let's get started with our first *pandas* notebook!

# DATA Series

panda Series one-dimensional labeled array

In [2]:
ser=pd.Series(data=[100,200,300,400,500], index=['tom','bob','nancy','dan','eric'])

In [3]:
ser

tom      100
bob      200
nancy    300
dan      400
eric     500
dtype: int64

In [4]:
ser=pd.Series([100,200,300,400,500], ['tom','bob','nancy','dan','eric'])
ser

tom      100
bob      200
nancy    300
dan      400
eric     500
dtype: int64

Pandas know how to resolve those two arrays into a series data structure.

In [5]:
ser=pd.Series([100,"foo",300,"bar",500], ['tom','bob','nancy','dan','eric'])
ser

tom      100
bob      foo
nancy    300
dan      bar
eric     500
dtype: object

In [6]:
ser.index

Index(['tom', 'bob', 'nancy', 'dan', 'eric'], dtype='object')

In [9]:
ser['nancy']

300

In [10]:
ser[4,3,1]

KeyError: (4, 3, 1)

In [11]:
ser[[3,4,1]]

dan     bar
eric    500
bob     foo
dtype: object

In [12]:
ser[["dan","eric","bob"]]

dan     bar
eric    500
bob     foo
dtype: object

In [13]:
ser.loc[["dan","eric","bob"]]

dan     bar
eric    500
bob     foo
dtype: object

In [14]:
"bob" in ser

True

In [15]:
ser*2

tom         200
bob      foofoo
nancy       600
dan      barbar
eric       1000
dtype: object

ta co the tinh toan tren series pandas

In [16]:
ser ** 2

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'

In [17]:
ser [['nancy','eric']] ** 2

nancy     90000
eric     250000
dtype: object

# Pandas Dataframe

pandas Dataframe is a 2-dimensional labeled data structure

Create DAtaframe from Dictionary of python Series


In [18]:
d= {'one':pd.Series([100.,200.,300.],index={"apple","ball","clock"}),
    'two':pd.Series([4000.,5000.,6000.,7000.],index={"apple","ball","cerill","dancy"})}


In [19]:
df=pd.DataFrame(d)
df

Unnamed: 0,one,two
apple,300.0,7000.0
ball,200.0,5000.0
cerill,,6000.0
clock,100.0,
dancy,,4000.0


In [20]:
df.index

Index(['apple', 'ball', 'cerill', 'clock', 'dancy'], dtype='object')

In [21]:
df.columns

Index(['one', 'two'], dtype='object')

In [22]:
pd.DataFrame(d,index=["dancy","ball","apple"])

Unnamed: 0,one,two
dancy,,4000.0
ball,200.0,5000.0
apple,300.0,7000.0


In [23]:
pd.DataFrame(d,index=["dancy","ball","apple"],columns=["two","five"])

Unnamed: 0,two,five
dancy,4000.0,
ball,5000.0,
apple,7000.0,


# create Dataframe from list of Python dictionaries

In [26]:
data=[{'alex':1,'joe':2},{'ema':5,'dore':10,'alice':20}]
pd.DataFrame(data)

Unnamed: 0,alex,alice,dore,ema,joe
0,1.0,,,,2.0
1,,20.0,10.0,5.0,


In [27]:
pd.DataFrame(data,index=["orange","red"])

Unnamed: 0,alex,alice,dore,ema,joe
orange,1.0,,,,2.0
red,,20.0,10.0,5.0,


In [28]:
pd.DataFrame(data,columns=["alex",'ema'])

Unnamed: 0,alex,ema
0,1.0,
1,,5.0


# Basic DataFrame operations

In [30]:
df

Unnamed: 0,one,two
apple,300.0,7000.0
ball,200.0,5000.0
cerill,,6000.0
clock,100.0,
dancy,,4000.0


In [32]:
df["one"]

apple     300.0
ball      200.0
cerill      NaN
clock     100.0
dancy       NaN
Name: one, dtype: float64

In [37]:
df["Three"]=df["one"] + df["two"]
df

Unnamed: 0,one,two,Three
apple,300.0,7000.0,7300.0
ball,200.0,5000.0,5200.0
cerill,,6000.0,
clock,100.0,,
dancy,,4000.0,


In [38]:
df['flag']= df['one']>250
df

Unnamed: 0,one,two,Three,flag
apple,300.0,7000.0,7300.0,True
ball,200.0,5000.0,5200.0,False
cerill,,6000.0,,False
clock,100.0,,,False
dancy,,4000.0,,False


In [42]:
three=df.pop('Three')
three

apple     7300.0
ball      5200.0
cerill       NaN
clock        NaN
dancy        NaN
Name: Three, dtype: float64

In [43]:
df

Unnamed: 0,one,two,flag
apple,300.0,7000.0,True
ball,200.0,5000.0,False
cerill,,6000.0,False
clock,100.0,,False
dancy,,4000.0,False


In [44]:
del df["two"]
df

Unnamed: 0,one,flag
apple,300.0,True
ball,200.0,False
cerill,,False
clock,100.0,False
dancy,,False


In [49]:
df.insert(1,"copy_of1_one",df["one"])

In [50]:
df


Unnamed: 0,one,copy_of1_one,flag,copy_of_one
apple,300.0,300.0,True,300.0
ball,200.0,200.0,False,200.0
cerill,,,False,
clock,100.0,100.0,False,100.0
dancy,,,False,


In [51]:
df.insert(3,"copy_of2_one",df["flag"])

In [52]:
df

Unnamed: 0,one,copy_of1_one,flag,copy_of2_one,copy_of_one
apple,300.0,300.0,True,True,300.0
ball,200.0,200.0,False,False,200.0
cerill,,,False,False,
clock,100.0,100.0,False,False,100.0
dancy,,,False,False,


In [54]:
df.insert(3,"copy_of3_one",df["flag"]) #3 la vi tri cot thu 0,1,2,3 

In [55]:
df

Unnamed: 0,one,copy_of1_one,flag,copy_of3_one,copy_of2_one,copy_of_one
apple,300.0,300.0,True,True,True,300.0
ball,200.0,200.0,False,False,False,200.0
cerill,,,False,False,False,
clock,100.0,100.0,False,False,False,100.0
dancy,,,False,False,False,


In [56]:
df["one_upper_half"]=df["one"][:2]
df

Unnamed: 0,one,copy_of1_one,flag,copy_of3_one,copy_of2_one,copy_of_one,one_upper_half
apple,300.0,300.0,True,True,True,300.0,300.0
ball,200.0,200.0,False,False,False,200.0,200.0
cerill,,,False,False,False,,
clock,100.0,100.0,False,False,False,100.0,
dancy,,,False,False,False,,
