# Welcome to my exploration of pandas!

Import your pandas and numpy as per convention

In [2]:
import numpy as np
import pandas as pd

# df is diabetes info
# mmdf is murder mystery
# dictionary_df is a small dict

## Read in your dataframes using your read method

In [3]:
df = pd.read_csv("diabetes.csv")
df

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1
767,1,93,70,31,0,30.4,0.315,23,0


This csv was uploaded using a combination of sql output commands and terminal only commands

In [4]:
mmdf = pd.read_csv('pandas-mm.csv')
mmdf

Unnamed: 0,id,name,license_id,address_number,address_street_name,ssn
0,10000,Christoper Peteuil,993845,624,Bankhall Ave,747714076
1,10007,Kourtney Calderwood,861794,2791,Gustavus Blvd,477972044
2,10010,Muoi Cary,385336,741,Northwestern Dr,828638512
3,10016,Era Moselle,431897,1987,Wood Glade St,614621061
4,10025,Trena Hornby,550890,276,Daws Hill Way,223877684
...,...,...,...,...,...,...
10006,99936,Luba Benser,274427,680,Carnage Blvd,685095054
10007,99941,Roxana Mckimley,975942,1613,Gate St,512136801
10008,99965,Cherie Zeimantz,287627,3661,The Water Ave,362877324
10009,99982,Allen Cruse,251350,3126,N Jean Dr,348734531


Use your head and tail methods to limit rows

In [5]:
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [6]:
df.tail()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
764,2,122,70,27,0,36.8,0.34,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1
767,1,93,70,31,0,30.4,0.315,23,0
768,0,123,77,0,1,36.3,0.252,55,1


## Get extra info using the shape and info methods

In [7]:
df.shape

(769, 9)

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 769 entries, 0 to 768
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               769 non-null    int64  
 1   Glucose                   769 non-null    int64  
 2   BloodPressure             769 non-null    int64  
 3   SkinThickness             769 non-null    int64  
 4   Insulin                   769 non-null    int64  
 5   BMI                       769 non-null    float64
 6   DiabetesPedigreeFunction  769 non-null    float64
 7   Age                       769 non-null    int64  
 8   Outcome                   769 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.2 KB


## Dictionary Basics

You can make a DataFrame from any normal python dictionary using the DataFrame method with the dictionary as an argument

In [9]:
test_dictionary = {
    "first":["Daniel", "Nathan", "Jayne"],
    "last":["Holmes", "Gay", "Mom"],
    "email":["1","2","3"]
}

dictionary_df = pd.DataFrame(test_dictionary)
dictionary_df

Unnamed: 0,first,last,email
0,Daniel,Holmes,1
1,Nathan,Gay,2
2,Jayne,Mom,3


Access your DataFrame the same way that you would a normal dictionary using bracket notation

In [10]:
dictionary_df["first"]

0    Daniel
1    Nathan
2     Jayne
Name: first, dtype: object

It's important to note that a DataFrame is like a container for Series objects. 
Each column is a series object as below.

In [11]:
type(dictionary_df)

pandas.core.frame.DataFrame

In [12]:
type(dictionary_df["first"])

pandas.core.series.Series

You can pass in lists to access multiple columns.

In [13]:
dictionary_df[["first", "last"]]

Unnamed: 0,first,last
0,Daniel,Holmes
1,Nathan,Gay
2,Jayne,Mom


Note this returns a DataFrama, not a series.

In [14]:
type(dictionary_df[["first", "last"]])

pandas.core.frame.DataFrame

## loc and iloc

Return a series with a certain row as specified

In [15]:
mmdf.iloc[0]

id                                  10000
name                   Christoper Peteuil
license_id                         993845
address_number                        624
address_street_name          Bankhall Ave
ssn                             747714076
Name: 0, dtype: object

Or multiple rows if you pass in a list of indexes, adding another integer/list allows you to specify columns

In [16]:
mmdf.iloc[[1,10,3,5], 1]

1     Kourtney Calderwood
10         Denver Barness
3             Era Moselle
5         Antione Godbolt
Name: name, dtype: object

loc is similar but you are able to search by labels too

In [17]:
mmdf.loc[[1,5,19] , ["name" , "ssn"]]

Unnamed: 0,name,ssn
1,Kourtney Calderwood,477972044
5,Antione Godbolt,491650087
19,Everett Flasher,133690837


### Slicing

Is inclusive...0:2 will include 0 through to 2 indexes.

In [18]:
df.loc[ 0:2 , "BloodPressure"]

0    72
1    66
2    64
Name: BloodPressure, dtype: int64

## The Power Of Pandas

One of the reasons pandas is so powerful is because it comes with a lot utility functions right out of the box. Such as value counts.

In [19]:
df["BloodPressure"].value_counts()

BloodPressure
70     57
74     52
78     45
68     45
72     44
64     43
80     40
76     39
60     37
0      35
62     34
66     30
82     30
88     25
84     23
90     22
58     21
86     21
50     13
56     12
54     11
52     11
92      8
75      8
65      7
94      6
85      6
48      5
44      4
96      4
110     3
106     3
100     3
98      3
30      2
55      2
104     2
46      2
108     2
122     1
95      1
102     1
61      1
24      1
38      1
40      1
114     1
77      1
Name: count, dtype: int64

## indexes

In [40]:
mmdf.set_index('name' , inplace=True) # sets the index as the name

In [41]:
mmdf.sort_index() # sorts index!

Unnamed: 0_level_0,id,license_id,address_number,address_street_name,ssn
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Aaron Brunken,49350,384559,3734,Fourteen Mile Way,648235164
Aaron Elery,96053,552786,2225,Castle Rd,734477633
Aaron Larcher,86834,338012,356,Harwalt Way,662734103
Aaron Reitler,71064,838065,2939,Cooke Ave,591644629
Abbey Staniec,80579,785801,2379,Pipers Glen Way,193845734
...,...,...,...,...,...
Zora Santio,95979,441152,2265,Tee Circle,899549970
Zoraida Peroni,57987,829852,1874,Coyote Ridge Blvd,774401885
Zoraida Stakemann,59504,412588,628,Molton Dr,537075483
Zula Brisbin,68204,702740,1424,Grasdene Blvd,585404403
