## Pandas

- numpy is primarily for numbers
- pandas does this as well. It is an extension of numpy.
- pandas is also set up to handle other data better than numpy 

- two data structures
- series - 1 dimensional
- dataframe - two-dimensional, mutable, and can contain different types of data 

## Panda Series 

In [6]:
import pandas as pd

In [8]:
pd.Series(['4 cups', '1 cup', '2 large', '1 can'])

0     4 cups
1      1 cup
2    2 large
3      1 can
dtype: object

In [11]:
s = pd.Series(data =[1, "2", 3, 4, "5", 6, 7, 8, "99", "100"])

In [13]:
print(s)

0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8     99
9    100
dtype: object


In [17]:
x = s.astype('int')

convert something that is a string to an integer

In [18]:
x.mean()

np.float64(23.5)

In [41]:
data = pd.Series([1,2,pd.NA, 4,5])

In [34]:
data.dropna(inplace = True)

In [23]:
print(data)

0    1
1    2
3    4
4    5
dtype: object


In [42]:
data.fillna('Null', inplace = True)

In [43]:
print(data)

0       1
1       2
2    Null
3       4
4       5
dtype: object


In [35]:
import numpy as np
data.apply(np.sqrt)

TypeError: loop of ufunc does not support argument 0 of type int which has no callable sqrt method

In [37]:
data = [1,2,3,4,5]

In [38]:
data = pd.Series([1,2,3,4,5])

In [39]:
data.apply(np.sqrt)

0    1.000000
1    1.414214
2    1.732051
3    2.000000
4    2.236068
dtype: float64

In [40]:
data.apply(lambda x: x+1)

0    2
1    3
2    4
3    5
4    6
dtype: int64

## Pandas DataFrame

In [75]:
data = pd.read_csv('RTdata.csv')

In [46]:
print(data)

    runcode  subjs sex       race       RTs         K
0     23887      1   m      asian  0.268098  0.254095
1     23888      2   f  caucasian  0.810172  1.020760
2     23889      3   m  caucasian  0.625572  2.882098
3     23890      4   f      asian  0.892729  1.024061
4     23891      5   m  caucasian  0.495700  2.093723
5     23892      6   f  caucasian  0.117297  1.012419
6     23893      7   f  caucasian  0.964358  2.904390
7     23894      8   m  caucasian  0.131785  1.922144
8     23895      9   f      asian  0.529800  2.359120
9     23896     10   f      asian  0.917709  2.088641
10    23897     11   m      asian  0.245590  1.693146
11    23898     12   f      asian  0.815274  1.072074
12    23899     13   m  caucasian  0.350466  3.813035
13    23900     14   m  caucasian  0.000590  2.791595
14    23901     15   f      asian  0.960922  0.064458
15    23902     16   m      asian  0.111476  4.186431
16    23903     17   f      asian  0.307025  3.112951
17    23904     18   m  cauc

In [47]:
data['subjs']

0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10
10    11
11    12
12    13
13    14
14    15
15    16
16    17
17    18
Name: subjs, dtype: int64

In [48]:
data['subjs'][5]

np.int64(6)

In [73]:
data = pd.read_csv('RTdata.csv', index_col = 'subjs')

In [74]:
print(data)

       runcode sex       race       RTs         K
subjs                                            
1        23887   m      asian  0.268098  0.254095
2        23888   f  caucasian  0.810172  1.020760
3        23889   m  caucasian  0.625572  2.882098
4        23890   f      asian  0.892729  1.024061
5        23891   m  caucasian  0.495700  2.093723
6        23892   f  caucasian  0.117297  1.012419
7        23893   f  caucasian  0.964358  2.904390
8        23894   m  caucasian  0.131785  1.922144
9        23895   f      asian  0.529800  2.359120
10       23896   f      asian  0.917709  2.088641
11       23897   m      asian  0.245590  1.693146
12       23898   f      asian  0.815274  1.072074
13       23899   m  caucasian  0.350466  3.813035
14       23900   m  caucasian  0.000590  2.791595
15       23901   f      asian  0.960922  0.064458
16       23902   m      asian  0.111476  4.186431
17       23903   f      asian  0.307025  3.112951
18       23904   m  caucasian  0.641817  4.045537


In [52]:
data.iloc[2,5]

np.float64(2.882098384)

In [53]:
data.loc[:, "K"]

0     0.254095
1     1.020760
2     2.882098
3     1.024061
4     2.093723
5     1.012419
6     2.904390
7     1.922144
8     2.359120
9     2.088641
10    1.693146
11    1.072074
12    3.813035
13    2.791595
14    0.064458
15    4.186431
16    3.112951
17    4.045537
Name: K, dtype: float64

In [54]:
data.iloc[:, 4]

0     0.268098
1     0.810172
2     0.625572
3     0.892729
4     0.495700
5     0.117297
6     0.964358
7     0.131785
8     0.529800
9     0.917709
10    0.245590
11    0.815274
12    0.350466
13    0.000590
14    0.960922
15    0.111476
16    0.307025
17    0.641817
Name: RTs, dtype: float64

In [81]:
data.groupby('sex').mean()

TypeError: agg function failed [how->mean,dtype->object]

In [65]:
data.groupby(['sex', 'race']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,runcode,subjs,RTs,K
sex,race,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
f,asian,23897.166667,11.166667,0.737243,1.620218
f,caucasian,23891.0,5.0,0.630609,1.645856
m,asian,23895.333333,9.333333,0.208388,2.044557
m,caucasian,23896.166667,10.166667,0.374321,2.924689


In [76]:
data.loc[:, "sex"]

0     m
1     f
2     m
3     f
4     m
5     f
6     f
7     m
8     f
9     f
10    m
11    f
12    m
13    m
14    f
15    m
16    f
17    m
Name: sex, dtype: object

In [82]:
titanic = pd.read_csv('titanic.csv')

In [83]:
print(titanic)

     PassengerId  Survived  Pclass  \
0              1         0       3   
1              2         1       1   
2              3         1       3   
3              4         1       1   
4              5         0       3   
..           ...       ...     ...   
886          887         0       2   
887          888         1       1   
888          889         0       3   
889          890         1       1   
890          891         0       3   

                                                  Name     Sex   Age  SibSp  \
0                              Braund, Mr. Owen Harris    male  22.0      1   
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                                Heikkinen, Miss Laina  female  26.0      0   
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                             Allen, Mr. William Henry    male  35.0      0   
..                                                 ...     ...   ... 

In [84]:
titanic["Name"].str.lower()

0                                braund, mr. owen harris
1      cumings, mrs. john bradley (florence briggs th...
2                                  heikkinen, miss laina
3           futrelle, mrs. jacques heath (lily may peel)
4                               allen, mr. william henry
                             ...                        
886                                montvila, rev. juozas
887                          graham, miss margaret edith
888              johnston, miss catherine helen "carrie"
889                                behr, mr. karl howell
890                                  dooley, mr. patrick
Name: Name, Length: 891, dtype: object

In [85]:
titanic["Name"].str.split(",")

0                             [Braund,  Mr. Owen Harris]
1      [Cumings,  Mrs. John Bradley (Florence Briggs ...
2                               [Heikkinen,  Miss Laina]
3        [Futrelle,  Mrs. Jacques Heath (Lily May Peel)]
4                            [Allen,  Mr. William Henry]
                             ...                        
886                             [Montvila,  Rev. Juozas]
887                       [Graham,  Miss Margaret Edith]
888           [Johnston,  Miss Catherine Helen "Carrie"]
889                             [Behr,  Mr. Karl Howell]
890                               [Dooley,  Mr. Patrick]
Name: Name, Length: 891, dtype: object

In [86]:
titanic['Surname'] = titanic['Name'].str.split(",")