# Pandas
## Dataframes - initialization

In [3]:
import pandas as pd
import numpy as np

**a DataFrame is an analog of a two-dimensional numpy array with both flexible 
row indices and flexible column names
and in other ways like a dictionary of Series structures sharing the same index**


**Any list of dictionaries can be made into a DataFrame
Even if some keys in the dictionary are missing, Pandas will fill them in with NaN**


In [5]:
data = pd.DataFrame([{'a': 1, 'b': 2}, 
                     {'b': 3, 'c': 4}])
data

Unnamed: 0,a,b,c
0,1.0,2,
1,,3,4.0


**Dataframe from a two-dimensional NumPy array**

In [7]:
data = pd.DataFrame(np.random.rand(3, 2), 
                    columns=['foo', 'bar'],
                    index=['a', 'b', 'c']
                   )

data

Unnamed: 0,foo,bar
a,0.509435,0.562484
b,0.047849,0.927211
c,0.68199,0.748786


**Creade a DataFrame from series objects**

In [9]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662, 'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193, 'New York': 19651127}, name='population')

df = pd.DataFrame({'area': area, 'population': population })
df.head()

Unnamed: 0,area,population
Alaska,1723337.0,
California,423967.0,38332521.0
New York,,19651127.0
Texas,695662.0,26448193.0


**Index object follow Python’s built-in set data structure, 
so that unions, intersections, differences can be computed in a familiar way**

In [11]:
indA = pd.Index([1, 3, 5, 7, 9]) 
indB = pd.Index([2, 3, 5, 7, 11])

**Intersection**

In [13]:
indA & indB

Int64Index([3, 5, 7], dtype='int64')

**Union**

In [15]:
indA | indB

Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

**Set difference**

In [16]:
indA ^ indB

Int64Index([1, 2, 9, 11], dtype='int64')

In [17]:
data.values

array([[0.50943499, 0.56248398],
       [0.04784851, 0.92721099],
       [0.68199033, 0.74878574]])

In [19]:
data.columns

Index(['foo', 'bar'], dtype='object')

**List data items**

In [21]:
list(data.items())

[('foo',
  a    0.509435
  b    0.047849
  c    0.681990
  Name: foo, dtype: float64),
 ('bar',
  a    0.562484
  b    0.927211
  c    0.748786
  Name: bar, dtype: float64)]

**Slicing**

**Notie that when you are slicing with an explicit index (i.e., data['a':'c']), 
the final index is included in the slice, while when you’re slicing with an implicit 
index (i.e., data[0:2]), the final index is excluded from the slice.**


In [22]:
data['bar']['a':'b']

a    0.562484
b    0.927211
Name: bar, dtype: float64

**Transpose**

In [24]:
data.T

Unnamed: 0,a,b,c
foo,0.509435,0.047849,0.68199
bar,0.562484,0.927211,0.748786


**Array-style indexing with iloc to acess row and column indexs**

In [26]:
data.iloc[:2,:1]

Unnamed: 0,foo
a,0.509435
b,0.047849


**Array-style indexing with loc to acess row and column labels**

In [27]:
data.loc[:'b', :'foo']

Unnamed: 0,foo
a,0.509435
b,0.047849


In [29]:
area = pd.Series({'California': 423967, 'Texas': 695662, 'New York': 141297, 'Florida': 170312,
'Illinois': 149995})
pop = pd.Series({'California': 38332521, 'Texas': 26448193,
                 'New York': 19651127, 'Florida': 19552860,
'Illinois': 12882135})
data = pd.DataFrame({'area':area, 'pop':pop})
data['density'] = data['pop'] / data['area']
data

Unnamed: 0,area,pop,density
California,423967,38332521,90.413926
Texas,695662,26448193,38.01874
New York,141297,19651127,139.076746
Florida,170312,19552860,114.806121
Illinois,149995,12882135,85.883763


**With loc access rows (also with masking) then columns by name**

In [30]:
data.loc[data.density > 100, ['pop', 'density']]

Unnamed: 0,pop,density
New York,19651127,139.076746
Florida,19552860,114.806121


In [32]:
data[data['density']>100]['pop']

New York    19651127
Florida     19552860
Name: pop, dtype: int64

**Access first x rows**