## Introduction to Pandas 2

In [1]:
import numpy as np
import pandas as pd

### Series as dict

In [5]:
ser_as_dict = pd.Series(['value_1', 'value_2', 'value_3'], index = ['a','b','c'])
ser_as_dict['b']

'value_2'

In [6]:
'a' in ser_as_dict

True

In [8]:
ser_as_dict['d'] = "value_4"
ser_as_dict

a    value_1
b    value_2
c    value_3
d    value_4
dtype: object

### Series as Multi-dimensional Arrays

In [10]:
ser_as_dict['a':'c'] # slicing with explicit indices

a    value_1
b    value_2
c    value_3
dtype: object

In [22]:
ser_as_dict[0:4]

a    value_1
b    value_2
c    value_3
d    value_4
dtype: object

In [24]:
sum(ser_as_dict == 'value_3')

1

Remember python uses explicit index when indexing, and implicit index with slicing, .loc and .iloc are used to always explicitly use the hand-input indices (.loc) or the default indices (.iloc)

### Data Selection in Data Frames

In [25]:
# pandas series for state's area
area = pd.Series({'California': 423967, 'Texas': 695662,
                    'New York': 141297, 'Florida': 170312,
                          'Illinois': 149995})

# pandas series for state's population
pop = pd.Series({'California': 38332521, 'Texas': 26448193,
                 'New York': 19651127, 'Florida': 19552860,
                 'Illinois': 12882135})

# pandas data frame for area and pop
data = pd.DataFrame({'area':area, 'pop':pop})

data

Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


In [26]:
data['area']

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
Name: area, dtype: int64

In [27]:
data['pop']

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
Name: pop, dtype: int64

In [33]:
data['density'] = data['pop'] / data['area']
data

Unnamed: 0,area,pop,density
California,423967,38332521,90.413926
Texas,695662,26448193,38.01874
New York,141297,19651127,139.076746
Florida,170312,19552860,114.806121
Illinois,149995,12882135,85.883763


### calling a single observation

In [45]:
data.loc['California']

area       4.239670e+05
pop        3.833252e+07
density    9.041393e+01
Name: California, dtype: float64

In [50]:
data.iloc[0]

area       4.239670e+05
pop        3.833252e+07
density    9.041393e+01
Name: California, dtype: float64

In [51]:
data.values[0]

array([4.23967000e+05, 3.83325210e+07, 9.04139261e+01])

### Finding states with a density over 100

In [53]:
data.loc[data.density > 100]

Unnamed: 0,area,pop,density
New York,141297,19651127,139.076746
Florida,170312,19552860,114.806121


### UFuncs index preservations

In [54]:
rng = np.random.RandomState(42)
           ser = pd.Series(rng.randint(0, 10, 4))
           ser
        


IndentationError: unexpected indent (<ipython-input-54-1960d2b0ec45>, line 2)

In [58]:
rng = np.random.RandomState(42) # seed random 42
ser = pd.Series(rng.randint(0,10,4)) 
ser

0    6
1    3
2    7
3    4
dtype: int64

[0;31mType:[0m        RandomState
[0;31mString form:[0m <mtrand.RandomState object at 0x1191b55e8>
[0;31mFile:[0m        /Applications/anaconda3/lib/python3.6/site-packages/numpy/random/mtrand.cpython-36m-darwin.so
[0;31mDocstring:[0m  
RandomState(seed=None)

Container for the Mersenne Twister pseudo-random number generator.

`RandomState` exposes a number of methods for generating random numbers
drawn from a variety of probability distributions. In addition to the
distribution-specific arguments, each method takes a keyword argument
`size` that defaults to ``None``. If `size` is ``None``, then a single
value is generated and returned. If `size` is an integer, then a 1-D
array filled with generated values is returned. If `size` is a tuple,
then an array with that shape is filled and returned.

*Compatibility Guarantee*
A fixed seed and a fixed series of calls to 'RandomState' methods using
the same parameters will always produce the same results up to roundoff
error except whe