<a href="https://colab.research.google.com/github/staufferkn/Course_Python_for_Data_Analysis/blob/main/Getting_Started_With_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#Getting started with Pandas

In [None]:
#Throughout the rest of the book, I use the following import conventions for NumPy and pandas:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

In [None]:
#5.1 Introduction to Pandas Data Structures
#To get started with pandas, you will need to get comfortable with its two workhorse
#data structures: Series and DataFrame. While they are not a universal solution for
#every problem, they provide a solid foundation for a wide variety of data tasks.

#Series
#A Series is a one-dimensional array-like object containing a sequence of values (of
#similar types to NumPy types) of the same type and an associated array of data labels,
#called its index. The simplest Series is formed from only an array of data:

In [None]:
obj = pd.Series([4, 7, -5, 3])

In [None]:
obj

Unnamed: 0,0
0,4
1,7
2,-5
3,3


In [None]:
#The string representation of a Series displayed interactively shows the index on the
#left and the values on the right. Since we did not specify an index for the data, a
#default one consisting of the integers 0 through N - 1 (where N is the length of the
#data) is created. You can get the array representation and index object of the Series via
#its array and index attributes, respectively:

In [None]:
obj.array

<NumpyExtensionArray>
[np.int64(4), np.int64(7), np.int64(-5), np.int64(3)]
Length: 4, dtype: int64

In [None]:
obj.index

RangeIndex(start=0, stop=4, step=1)

In [None]:
#The result of the .array attribute is a PandasArray which usually wraps a NumPy
#array but can also contain special extension array types which will be discussed more
#in Section 7.3, “Extension Data Types,” on page 224.
#Often, you’ll want to create a Series with an index identifying each data point with a
#label:

In [None]:
obj2 = pd.Series([4, 7, -5, 3], index=["d", "b", "a", "c"])

In [None]:
obj2

Unnamed: 0,0
d,4
b,7
a,-5
c,3


In [None]:
obj2.index

Index(['d', 'b', 'a', 'c'], dtype='object')

In [None]:
#Compared with NumPy arrays, you can use labels in the index when selecting single values or a set of values:

In [None]:
obj2["a"]

np.int64(-5)

In [None]:
obj2["d"]

np.int64(4)

In [None]:
obj2[["c", "a", "d"]]

Unnamed: 0,0
c,3
a,-5
d,4


In [None]:
#Here ["c", "a", "d"] is interpreted as a list of indices, even though it contains
#strings instead of integers. Using NumPy functions or NumPy-like operations, such as filtering with a Boolean
#array, scalar multiplication, or applying math functions, will preserve the index-value link:

In [None]:
obj2[obj2 > 0]

Unnamed: 0,0
d,4
b,7
c,3


In [None]:
obj2 * 2

Unnamed: 0,0
d,8
b,14
a,-10
c,6


In [None]:
np.exp(obj2)

Unnamed: 0,0
d,54.59815
b,1096.633158
a,0.006738
c,20.085537


In [None]:
#Another way to think about a Series is as a fixed-length, ordered dictionary, as it is a
#mapping of index values to data values. It can be used in many contexts where you
#might use a dictionary:

In [None]:
"b" in obj2

True

In [None]:
"e" in obj2

False

In [None]:
#Should you have data contained in a Python dictionary, you can create a Series from
#it by passing the dictionary:

In [None]:
sdata = {"Ohio": 35000, "Texas": 71000, "Oregon": 16000, "Utah": 5000}

In [None]:
obj3 = pd.Series(sdata)

In [None]:
obj3

Unnamed: 0,0
Ohio,35000
Texas,71000
Oregon,16000
Utah,5000


In [None]:
obj3.index

Index(['Ohio', 'Texas', 'Oregon', 'Utah'], dtype='object')

In [None]:
#A Series can be converted back to a dictionary with its to_dict method

In [None]:
obj3.to_dict()

{'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}

In [None]:
#When you are only passing a dictionary, the index in the resulting Series will respect
#the order of the keys according to the dictionary’s keys method, which depends on
#the key insertion order. You can override this by passing an index with the dictionary
#keys in the order you want them to appear in the resulting Series:

In [None]:
states = ["California", "Ohio", "Oregon", "Texas"]

In [None]:
obj4 = pd.Series(sdata, index=states)

In [None]:
obj4

Unnamed: 0,0
California,
Ohio,35000.0
Oregon,16000.0
Texas,71000.0


In [None]:
#Here, three values found in sdata were placed in the appropriate locations, but since
#no value for "California" was found, it appears as NaN (Not a Number), which is
#considered in pandas to mark missing or NA values. Since "Utah" was not included
#in states, it is excluded from the resulting object.

#I will use the terms “missing,” “NA,” or “null” interchangeably to refer to missing data.
#The isna and notna functions in pandas should be used to detect missing data:

In [None]:
pd.isna(obj4)

Unnamed: 0,0
California,True
Ohio,False
Oregon,False
Texas,False


In [None]:
pd.notna(obj4)

Unnamed: 0,0
California,False
Ohio,True
Oregon,True
Texas,True


In [None]:
#Series also has these as instance methods:

In [None]:
obj4.isna()

Unnamed: 0,0
California,True
Ohio,False
Oregon,False
Texas,False


In [None]:
obj4.notna()

Unnamed: 0,0
California,False
Ohio,True
Oregon,True
Texas,True


In [None]:
#A useful Series feature for many applications is that it automatically aligns by index
#label in arithmetic operations:

In [None]:
obj3

Unnamed: 0,0
Ohio,35000
Texas,71000
Oregon,16000
Utah,5000


In [None]:
obj4

Unnamed: 0,0
California,
Ohio,35000.0
Oregon,16000.0
Texas,71000.0


In [None]:
obj3 + obj4

Unnamed: 0,0
California,
Ohio,70000.0
Oregon,32000.0
Texas,142000.0
Utah,


In [None]:
#Both the Series object itself and its index have a name attribute, which integrates with
#other areas of pandas functionality:

In [None]:
obj4.name = "population"

In [None]:
obj4.index.name = "state"

In [None]:
obj4

Unnamed: 0_level_0,population
state,Unnamed: 1_level_1
California,
Ohio,35000.0
Oregon,16000.0
Texas,71000.0


In [None]:
#Serie's index can be altered in place by assignment

In [None]:
obj

Unnamed: 0,0
0,4
1,7
2,-5
3,3


In [None]:
obj.index = ["Bob", "Steve", "Jeff", "Ryan"]

In [None]:
obj

Unnamed: 0,0
Bob,4
Steve,7
Jeff,-5
Ryan,3


In [None]:
#DataFrame
#A DataFrame represents a rectangular table of data and contains an ordered, named
#collection of columns, each of which can be a different value type (numeric, string,
#Boolean, etc.). The DataFrame has both a row and column index; it can be thought of
#as a dictionary of Series all sharing the same index.

#While a DataFrame is physically two-dimensional, you can use it
#to represent higher dimensional data in a tabular format using
#hierarchical indexing, a subject we will discuss in Chapter 8 and an
#ingredient in some of the more advanced data-handling features in pandas.

In [None]:
#There are many ways to construct a DataFrame, though one of the most common is
#from a dictionary of equal-length lists or NumPy arrays:

In [None]:
data = {"state": ["Ohio", "Ohio", "Ohio", "Nevada", "Nevada", "Nevada"],
 "year": [2000, 2001, 2002, 2001, 2002, 2003],
 "pop": [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}

In [None]:
frame = pd.DataFrame(data)

In [None]:
frame

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


In [None]:
#For large DataFrames, the head method selects only the first five rows:

In [None]:
frame.head()

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9


In [None]:
#Similarly .tail returns the last five rows

In [None]:
frame.tail()

Unnamed: 0,state,year,pop
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


In [None]:
#If you specify a sequence of columns, the DataFrame’s columns will be arranged in that order:

In [None]:
pd.DataFrame(data, columns=["year", "state", "pop"])

Unnamed: 0,year,state,pop
0,2000,Ohio,1.5
1,2001,Ohio,1.7
2,2002,Ohio,3.6
3,2001,Nevada,2.4
4,2002,Nevada,2.9
5,2003,Nevada,3.2


In [None]:
#If you pass a column that isn’t contained in the dictionary, it will appear with missing values in the result:

In [None]:
frame2 = pd.DataFrame(data, columns=["year", "state", "pop", "debt"])

In [None]:
frame2

Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,
1,2001,Ohio,1.7,
2,2002,Ohio,3.6,
3,2001,Nevada,2.4,
4,2002,Nevada,2.9,
5,2003,Nevada,3.2,


In [None]:
frame2.columns

Index(['year', 'state', 'pop', 'debt'], dtype='object')

In [None]:
#A column in a DataFrame can be retrieved as a Series either by dictionary-like
#notation or by using the dot attribute notation:

In [None]:
frame["state"]

Unnamed: 0,state
0,Ohio
1,Ohio
2,Ohio
3,Nevada
4,Nevada
5,Nevada


In [None]:
frame.year

Unnamed: 0,year
0,2000
1,2001
2,2002
3,2001
4,2002
5,2003


In [None]:
#Attribute-like access (e.g., frame2.year) and tab completion of
#column names in IPython are provided as a convenience.
#frame2[column] works for any column name, but frame2.column
#works only when the column name is a valid Python variable name
#and does not conflict with any of the method names in DataFrame.
#For example, if a column’s name contains whitespace or symbols
#other than underscores, it cannot be accessed with the dot attribute method.

In [None]:
#Note that the returned Series have the same index as the DataFrame, and their name
#attribute has been appropriately set.

#Rows can also be retrieved by position or name with the special iloc and loc
#attributes (more on this later in “Selection on DataFrame with loc and iloc” on page

In [None]:
frame2.loc[1]

Unnamed: 0,1
year,2001
state,Ohio
pop,1.7
debt,


In [None]:
frame2.iloc[2]

Unnamed: 0,2
year,2002
state,Ohio
pop,3.6
debt,


In [None]:
#Columns can be modified by assignment. For example, the empty debt column could
#be assigned a scalar value or an array of values:

In [None]:
frame2["debt"] = 16.5

In [None]:
frame2

Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,16.5
1,2001,Ohio,1.7,16.5
2,2002,Ohio,3.6,16.5
3,2001,Nevada,2.4,16.5
4,2002,Nevada,2.9,16.5
5,2003,Nevada,3.2,16.5


In [None]:
frame2["debt"] = np.arange(6.)

In [None]:
frame2

Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,0.0
1,2001,Ohio,1.7,1.0
2,2002,Ohio,3.6,2.0
3,2001,Nevada,2.4,3.0
4,2002,Nevada,2.9,4.0
5,2003,Nevada,3.2,5.0


In [None]:
#When you are assigning lists or arrays to a column, the value’s length must match the
#length of the DataFrame. If you assign a Series, its labels will be realigned exactly to
#the DataFrame’s index, inserting missing values in any index values not present:

In [None]:
val = pd.Series([-1.2, -1.5, -1.7], index=["two", "four", "five"])

In [None]:
frame2["debt"] = val

In [None]:
frame2

Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,
1,2001,Ohio,1.7,
2,2002,Ohio,3.6,
3,2001,Nevada,2.4,
4,2002,Nevada,2.9,
5,2003,Nevada,3.2,


In [None]:
#Assigning a column that doesn’t exist will create a new column.
#The del keyword will delete columns like with a dictionary. As an example, I first add
#a new column of Boolean values where the state column equals "Ohio":

In [None]:
frame2["eastern"] = frame2["state"] == "Ohio"

In [None]:
frame2

Unnamed: 0,year,state,pop,debt,eastern
0,2000,Ohio,1.5,,True
1,2001,Ohio,1.7,,True
2,2002,Ohio,3.6,,True
3,2001,Nevada,2.4,,False
4,2002,Nevada,2.9,,False
5,2003,Nevada,3.2,,False


In [None]:
#New columns cannot be created with the frame2.eastern dotattribute notation.

In [None]:
#The del method can then be used to remove this column:

del frame2["eastern"]

In [None]:
frame2.columns

Index(['year', 'state', 'pop', 'debt'], dtype='object')

In [None]:
#The column returned from indexing a DataFrame is a view on the
#underlying data, not a copy. Thus, any in-place modifications to
#the Series will be reflected in the DataFrame. The column can be
#explicitly copied with the Series’s copy method.

In [None]:
#Anaother form of data is a nested dictionary of dictionaries

In [None]:
populations = {"Ohio": {2000: 1.5, 2001: 1.7, 2002:3.6}, "Nevada": {2001: 2.4, 2002: 2.0}}

In [None]:
#If the nested dictionary is passed to the DataFrame, pandas will interpret the outer
#dictionary keys as the columns, and the inner keys as the row indices

In [None]:
frame3 = pd.DataFrame(populations)

In [None]:
frame3

Unnamed: 0,Ohio,Nevada
2000,1.5,
2001,1.7,2.4
2002,3.6,2.0


In [None]:
#You can transpose the DataFrame (swap rows and columns) with similar syntax to a NumPy array:

In [None]:
frame3.T

Unnamed: 0,2000,2001,2002
Ohio,1.5,1.7,3.6
Nevada,,2.4,2.0


In [None]:
#Note that transposing discards the column data types if the columns do not all have the same data type, so transposing and then
#transposing back may lose the previous type information. The columns become arrays of pure Python objects in this case.
#The keys in the inner dictionaries are combined to form the index in the result. This
#isn’t true if an explicit index is specified:

In [None]:
pd.DataFrame(populations, index=[2001, 2002, 2003])

Unnamed: 0,Ohio,Nevada
2001,1.7,2.4
2002,3.6,2.0
2003,,


In [None]:
#Dictionaries of Series are treated in much the same way

In [None]:
pdata = {"Ohio": frame3["Ohio"][:-1], "Nevada": frame3["Nevada"][:2]}

In [None]:
pd.DataFrame(pdata)

Unnamed: 0,Ohio,Nevada
2000,1.5,
2001,1.7,2.4


In [None]:
#If a DataFrame's index and columns have their name attributes set, these will also be displayed

In [None]:
frame3.index.name = "year"

In [None]:
frame3.columns.name = "state"

In [None]:
frame3

state,Ohio,Nevada
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2000,1.5,
2001,1.7,2.4
2002,3.6,2.0


In [None]:
#Unlike Series, DataFrame does not have a name attribute. DataFrame’s to_numpy
#method returns the data contained in the DataFrame as a two-dimensional ndarray:

In [None]:
frame3.to_numpy()

array([[1.5, nan],
       [1.7, 2.4],
       [3.6, 2. ]])

In [None]:
#If the DataFrame’s columns are different data types, the data type of the returned
#array will be chosen to accommodate all of the columns:

In [None]:
frame2.to_numpy()

array([[2000, 'Ohio', 1.5, nan],
       [2001, 'Ohio', 1.7, nan],
       [2002, 'Ohio', 3.6, nan],
       [2001, 'Nevada', 2.4, nan],
       [2002, 'Nevada', 2.9, nan],
       [2003, 'Nevada', 3.2, nan]], dtype=object)

In [None]:
#Index Objects

In [None]:
#pandas’s Index objects are responsible for holding the axis labels (including a Data‐
#Frame’s column names) and other metadata (like the axis name or names). Any array
#or other sequence of labels you use when constructing a Series or DataFrame is
#internally converted to an Index:

In [None]:
obj = pd.Series(np.arange(3), index=["a", "b", "c"])

In [None]:
index = obj.index

In [None]:
index

Index(['a', 'b', 'c'], dtype='object')

In [None]:
index[1:]

Index(['b', 'c'], dtype='object')

In [None]:
#Index objects are immutable and thus can’t be modified by the user:

In [None]:
index[1] = "d" #Type error

TypeError: Index does not support mutable operations

In [None]:
#Immutability makes it safer to share Index objects among data structures:

In [None]:
labels = pd.Index(np.arange(3))

In [None]:
labels

Index([0, 1, 2], dtype='int64')

In [None]:
obj2 = pd.Series([1.5, -2.5, 0], index=labels)

In [None]:
obj2

Unnamed: 0,0
0,1.5
1,-2.5
2,0.0


In [None]:
obj2.index is labels

True

In [None]:
#In addition to being array-like, an Index also behaves like a fixed-size set:

In [None]:
frame3

state,Ohio,Nevada
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2000,1.5,
2001,1.7,2.4
2002,3.6,2.0


In [None]:
frame3.columns

Index(['Ohio', 'Nevada'], dtype='object', name='state')

In [None]:
"Ohio" in frame3.columns

True

In [None]:
2003 in frame3.index

False

In [None]:
#Unlike Python sets, a pandas Index can contain duplicate labels:

In [None]:
pd.Index(["foo", "foo", "bar", "bar"])

Index(['foo', 'foo', 'bar', 'bar'], dtype='object')

In [None]:
#Selections with duplicate labels will select all occurrences of that label.

In [None]:
#5.2 Essential Functionality
#Reindexing

In [None]:
#An important method on pandas objects is reindex, which means to create a new
#object with the values rearranged to align with the new index. Consider an example:

In [None]:
obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=["d", "b", "a", "c"])

In [None]:
obj

Unnamed: 0,0
d,4.5
b,7.2
a,-5.3
c,3.6


In [None]:
#Calling reindex on this Series rearranges the data according to the new index,
#introducing missing values if any index values were not already present:

In [None]:
obj2 = obj.reindex(["a", "b", "c", "d", "e"])

In [None]:
obj2

Unnamed: 0,0
a,-5.3
b,7.2
c,3.6
d,4.5
e,


In [None]:
#For ordered data like time series, you may want to do some interpolation or filling of
#values when reindexing. The method option allows us to do this, using a method such
#as ffill, which forward-fills the values:

In [None]:
obj3 = pd.Series(["blue", "purple", "yellow"], index=[0, 2, 4])

In [None]:
obj3

Unnamed: 0,0
0,blue
2,purple
4,yellow


In [None]:
obj3.reindex(np.arange(6), method="ffill")

Unnamed: 0,0
0,blue
1,blue
2,purple
3,purple
4,yellow
5,yellow


In [None]:
#With DataFrame, reindex can alter the (row) index, columns, or both. When passed
#only a sequence, it reindexes the rows in the result:

In [None]:
frame = pd.DataFrame(np.arange(9).reshape((3, 3)),
 index=["a", "c", "d"],
 columns=["Ohio", "Texas", "California"])

In [None]:
frame

Unnamed: 0,Ohio,Texas,California
a,0,1,2
c,3,4,5
d,6,7,8


In [None]:
frame2 = frame.reindex(index=["a", "b", "c", "d"])

In [None]:
frame2

Unnamed: 0,Ohio,Texas,California
a,0.0,1.0,2.0
b,,,
c,3.0,4.0,5.0
d,6.0,7.0,8.0


In [None]:
#The columns can be reindexed with the columns keyword:

In [None]:
states = ["Texas", "Utah", "California"]

In [None]:
frame.reindex(columns=states)

Unnamed: 0,Texas,Utah,California
a,1,,2
c,4,,5
d,7,,8


In [None]:
#Because "Ohio" was not in states, the data for that column is dropped from the result.
#Another way to reindex a particular axis is to pass the new axis labels as a positional
#argument and then specify the axis to reindex with the axis keyword:

In [None]:
frame.reindex(states, axis="columns")

Unnamed: 0,Texas,Utah,California
a,1,,2
c,4,,5
d,7,,8


In [None]:
#As we’ll explore later in “Selection on DataFrame with loc and iloc” on page 147, you
#can also reindex by using the loc operator, and many users prefer to always do it this
#way. This works only if all of the new index labels already exist in the DataFrame
#(whereas reindex will insert missing data for new labels):

In [None]:
frame.loc[["a", "d", "c"], ["California", "Texas"]]

Unnamed: 0,California,Texas
a,2,1
d,8,7
c,5,4


In [None]:
#Dropping Entries from an Axis
#Dropping one or more entries from an axis is simple if you already have an index
#array or list without those entries, since you can use the reindex method or .locbased indexing. As that can require a bit of munging and set logic, the drop method
#will return a new object with the indicated value or values deleted from an axis:

In [None]:
obj = pd.Series(np.arange(5.), index=["a", "b", "c", "d", "e"])

In [None]:
obj

Unnamed: 0,0
a,0.0
b,1.0
c,2.0
d,3.0
e,4.0


In [None]:
new_obj = obj.drop("c")

In [None]:
new_obj

Unnamed: 0,0
a,0.0
b,1.0
d,3.0
e,4.0


In [None]:
obj.drop(["d", "c"])

Unnamed: 0,0
a,0.0
b,1.0
e,4.0


In [None]:
#With DataFrame, index values can be deleted from either axis. To illustrate this, we
#first create an example DataFrame:

In [None]:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
 index=["Ohio", "Colorado", "Utah", "New York"],
 columns=["one", "two", "three", "four"])

In [None]:
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [None]:
#Calling drop with a sequence of labels will drop values from the row labels (axis 0):

In [None]:
data.drop(index=["Colorado", "Ohio"])

Unnamed: 0,one,two,three,four
Utah,8,9,10,11
New York,12,13,14,15


In [None]:
#To drop labels from the columns, instead use the columns keyword:

In [None]:
data.drop(columns=["two"])

Unnamed: 0,one,three,four
Ohio,0,2,3
Colorado,4,6,7
Utah,8,10,11
New York,12,14,15


In [None]:
#You can also drop values from the columns by passing axis=1 (which is like NumPy)
#or axis="columns":

In [None]:
data.drop("two", axis=1)

Unnamed: 0,one,three,four
Ohio,0,2,3
Colorado,4,6,7
Utah,8,10,11
New York,12,14,15


In [None]:
data.drop(["two", "four"], axis="columns")

Unnamed: 0,one,three
Ohio,0,2
Colorado,4,6
Utah,8,10
New York,12,14


In [None]:
#Indexing, Selection, and Filtering
#Series indexing (obj[...]) works analogously to NumPy array indexing, except you
#can use the Series’s index values instead of only integers. Here are some examples of
#this

In [None]:
obj = pd.Series(np.arange(4.), index=["a", "b", "c", "d"])

In [None]:
obj

Unnamed: 0,0
a,0.0
b,1.0
c,2.0
d,3.0


In [None]:
obj["b"]

np.float64(1.0)

In [None]:
obj[1]

  obj[1]


np.float64(1.0)

In [None]:
obj[2:4]

Unnamed: 0,0
c,2.0
d,3.0


In [None]:
obj[["b", "a", "d"]]

Unnamed: 0,0
b,1.0
a,0.0
d,3.0


In [None]:
obj[obj<2]

Unnamed: 0,0
a,0.0
b,1.0


In [None]:
#While you can select data by label this way, the preferred way to select index values is
#with the special loc operator:

In [None]:
obj.loc[["b", "a", "d"]]

Unnamed: 0,0
b,1.0
a,0.0
d,3.0


In [None]:
#The reason to prefer loc is because of the different treatment of integers when
#indexing with []. Regular []-based indexing will treat integers as labels if the index
#contains integers, so the behavior differs depending on the data type of the index. For
#example

In [None]:
obj1 = pd.Series([1, 2, 3], index=[2, 0, 1])

In [None]:
obj2 = pd.Series([1, 2, 3], index=["a", "b", "c"])

In [None]:
obj1

Unnamed: 0,0
2,1
0,2
1,3


In [None]:
obj2

Unnamed: 0,0
a,1
b,2
c,3


In [None]:
obj1[[0, 1, 2]]

Unnamed: 0,0
0,2
1,3
2,1


In [None]:
obj2[[0, 1, 2]]

  obj2[[0, 1, 2]]


Unnamed: 0,0
a,1
b,2
c,3


In [None]:
#When using loc, the expression obj.loc[[0, 1, 2]] will fail when the index does
#not contain integers:

In [None]:
obj2.loc[[0, 1]]

KeyError: "None of [Index([0, 1], dtype='int64')] are in the [index]"

In [None]:
#Since loc operator indexes exclusively with labels, there is also an iloc operator
#that indexes exclusively with integers to work consistently whether or not the index
#contains integers

In [None]:
obj1.iloc[[0, 1, 2]]

Unnamed: 0,0
2,1
0,2
1,3


In [None]:
obj2.iloc[[0, 1, 2]]

Unnamed: 0,0
a,1
b,2
c,3


In [None]:
#Indexing into a DataFrame retrieves one or more columns either with a single value
#or sequence:

In [None]:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
 index=["Ohio", "Colorado", "Utah", "New York"],
 columns=["one", "two", "three", "four"])

In [None]:
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [None]:
data["two"]

Unnamed: 0,two
Ohio,1
Colorado,5
Utah,9
New York,13


In [None]:
data[["three", "one"]]

Unnamed: 0,three,one
Ohio,2,0
Colorado,6,4
Utah,10,8
New York,14,12


In [None]:
#Indexing like this has a few special cases. The first is slicing or selecting data with a
#Boolean array:

In [None]:
data[:2]

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7


In [None]:
data[data["three"] > 5]

Unnamed: 0,one,two,three,four
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [None]:
#The row selection syntax data[:2] is provided as a convenience. Passing a single
#element or a list to the [] operator selects columns.

In [None]:
#Another use case is indexing with a Boolean DataFrame, such as one produced by
#a scalar comparison. Consider a DataFrame with all Boolean values produced by
#comparing with a scalar value:

In [None]:
data < 5

Unnamed: 0,one,two,three,four
Ohio,True,True,True,True
Colorado,True,False,False,False
Utah,False,False,False,False
New York,False,False,False,False


In [None]:
#We can use this DataFrame to assign the value 0 to each location with the value True,
#like so:

In [None]:
data[data<5] = 0

In [None]:
data

Unnamed: 0,one,two,three,four
Ohio,0,0,0,0
Colorado,0,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [None]:
#Selection on DataFrame with loc and iloc
#Like Series, DataFrame has special attributes loc and iloc for label-based and
#integer-based indexing, respectively. Since DataFrame is two-dimensional, you can
#select a subset of the rows and columns with NumPy-like notation using either axis
#labels (loc) or integers (iloc).
#As a first example, let’s select a single row by label:

In [None]:
data

Unnamed: 0,one,two,three,four
Ohio,0,0,0,0
Colorado,0,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [None]:
data.loc["Colorado"]

Unnamed: 0,Colorado
one,0
two,5
three,6
four,7


In [None]:
#The result of selecting a single row is a Series with an index that contains the
#DataFrame’s column labels. To select multiple roles, creating a new DataFrame, pass a
#sequence of labels:

In [None]:
data.loc[["Colorado", "New York"]]

Unnamed: 0,one,two,three,four
Colorado,0,5,6,7
New York,12,13,14,15


In [None]:
#You can combine both row and column selection in loc by separating the selections
#with a comma

In [None]:
data.loc["Colorado", ["two", "three"]]

Unnamed: 0,Colorado
two,5
three,6


In [None]:
#We’ll then perform some similar selections with integers using iloc:

In [None]:
data.iloc[2]

Unnamed: 0,Utah
one,8
two,9
three,10
four,11


In [None]:
data.iloc[[2, 1]]

Unnamed: 0,one,two,three,four
Utah,8,9,10,11
Colorado,0,5,6,7


In [None]:
data.iloc[2, [3, 0, 1]]

Unnamed: 0,Utah
four,11
one,8
two,9


In [None]:
#Both indexing functions work with slices in addition to single labels or lists of labels:

In [None]:
data.loc[:"Utah", "two"]

Unnamed: 0,two
Ohio,0
Colorado,5
Utah,9


In [None]:
data.iloc[:, :3][data.three > 5]

Unnamed: 0,one,two,three
Colorado,0,5,6
Utah,8,9,10
New York,12,13,14


In [None]:
#Boolean arrays can be used with loc but not iloc

In [None]:
data.loc[data.three >= 2]

Unnamed: 0,one,two,three,four
Colorado,0,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [None]:
#Integer indexing pitfalls
#Working with pandas objects indexed by integers can be a stumbling block for new
#users since they work differently from built-in Python data structures like lists and
#tuples. For example, you might not expect the following code to generate an error:

In [None]:
ser = pd.Series(np.arange(3.))

In [None]:
ser

Unnamed: 0,0
0,0.0
1,1.0
2,2.0


In [None]:
ser[-1]

KeyError: -1

In [None]:
#In this case, pandas could “fall back” on integer indexing, but it is difficult to do
#this in general without introducing subtle bugs into the user code. Here we have an
#index containing 0, 1, and 2, but pandas does not want to guess what the user wants
#(label-based indexing or position-based):

In [None]:
ser

Unnamed: 0,0
0,0.0
1,1.0
2,2.0


In [None]:
#On the other hand, with a noninteger index, there is no such ambiguity:

In [None]:
ser2 = pd.Series(np.arange(3.), index=["a", "b", "c"])

In [None]:
ser2[-1]

  ser2[-1]


np.float64(2.0)

In [None]:
#If you have an axis index containing integers, data selection will always be label
#oriented. As I said above, if you use loc (for labels) or iloc (for integers) you will get
#exactly what you want:

In [None]:
ser.iloc[-1]

np.float64(2.0)

In [None]:
#On the other hand, slicing with integers is always integer oriented:

In [None]:
ser[:2]

Unnamed: 0,0
0,0.0
1,1.0


In [None]:
#As a result of these pitfalls, it is best to always prefer indexing with loc and iloc to
#avoid ambiguity.

In [None]:
#Pitfalls with chained indexing
#In the previous section we looked at how you can do flexible selections on a Data‐
#Frame using loc and iloc. These indexing attributes can also be used to modify
#DataFrame objects in place, but doing so requires some care.
#For example, in the example DataFrame above, we can assign to a column or row by
#label or integer position:

In [None]:
data.loc[:, "one"] = 1

In [None]:
data

Unnamed: 0,one,two,three,four
Ohio,1,0,0,0
Colorado,1,5,6,7
Utah,1,9,10,11
New York,1,13,14,15


In [None]:
data.iloc[2] = 5

In [None]:
data

Unnamed: 0,one,two,three,four
Ohio,1,0,0,0
Colorado,1,5,6,7
Utah,5,5,5,5
New York,1,13,14,15


In [None]:
data.loc[data["four"] > 5] = 3

In [None]:
data

Unnamed: 0,one,two,three,four
Ohio,1,0,0,0
Colorado,3,3,3,3
Utah,5,5,5,5
New York,3,3,3,3


In [None]:
data.loc[data.three == 5] = 6

In [None]:
data

Unnamed: 0,one,two,three,four
Ohio,1,0,0,0
Colorado,3,3,3,3
Utah,6,6,6,6
New York,3,3,3,3


In [None]:
#Arithmetic and Data Alignment
#pandas can make it much simpler to work with objects that have different indexes.
#For example, when you add objects, if any index pairs are not the same, the respective
#index in the result will be the union of the index pairs. Let’s look at an example:

In [None]:
s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=["a", "c", "d", "e"])

In [None]:
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1],
  index=["a", "c", "e", "f", "g"])

In [None]:
s1

Unnamed: 0,0
a,7.3
c,-2.5
d,3.4
e,1.5


In [None]:
s2

Unnamed: 0,0
a,-2.1
c,3.6
e,-1.5
f,4.0
g,3.1


In [None]:
#Adding these yields

In [None]:
s1 + s2

Unnamed: 0,0
a,5.2
c,1.1
d,
e,0.0
f,
g,


In [None]:
#The internal data alignment introduces missing values in the label locations that don’t
#overlap. Missing values will then propagate in further arithmetic computations.

#In the case of DataFrame, alignment is performed on both rows and columns:

In [None]:
df1 = pd.DataFrame(np.arange(9.).reshape((3, 3)), columns=list("bcd"),
  index=["Ohio", "Texas", "Colorado"])

In [None]:
df2 = pd.DataFrame(np.arange(12.).reshape((4, 3)), columns=list("bde"),
  index=["Utah", "Ohio", "Texas", "Oregon"])

In [None]:
df1

Unnamed: 0,b,c,d
Ohio,0.0,1.0,2.0
Texas,3.0,4.0,5.0
Colorado,6.0,7.0,8.0


In [None]:
df2

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


In [None]:
#Adding these returns a DataFrame with index and columns that are the unions of the ones in each DataFrame:

In [None]:
df1 + df2

Unnamed: 0,b,c,d,e
Colorado,,,,
Ohio,3.0,,6.0,
Oregon,,,,
Texas,9.0,,12.0,
Utah,,,,


In [None]:
#Since the "c" and "e" columns are not found in both DataFrame objects, they appear
#as missing in the result. The same holds for the rows with labels that are not common
#to both objects.

#If you add DataFrame objects with no column or row labels in common, the result
#will contain all nulls:

In [None]:
df1 = pd.DataFrame({"A": [1, 2]})

In [None]:
df2 = pd.DataFrame({"B": [3, 4]})

In [None]:
df1

Unnamed: 0,A
0,1
1,2


In [None]:
df2

Unnamed: 0,B
0,3
1,4


In [None]:
df1 + df2

Unnamed: 0,A,B
0,,
1,,


In [None]:
#Arithmetic methods with fill values
#In arithmetic operations between differently indexed objects, you might want to fill
#with a special value, like 0, when an axis label is found in one object but not the other.

#Here is an example where we set a particular value to NA (null) by assigning np.nan
#to it:

In [None]:
df1 = pd.DataFrame(np.arange(12.).reshape((3, 4)),
  columns=list("abcd"))

In [None]:
df2 = pd.DataFrame(np.arange(20.).reshape((4, 5)),
  columns=list("abcde"))

In [None]:
df2.loc[1, "b"] = np.nan

In [None]:
df1

Unnamed: 0,a,b,c,d
0,0.0,1.0,2.0,3.0
1,4.0,5.0,6.0,7.0
2,8.0,9.0,10.0,11.0


In [None]:
df2

Unnamed: 0,a,b,c,d,e
0,0.0,1.0,2.0,3.0,4.0
1,5.0,,7.0,8.0,9.0
2,10.0,11.0,12.0,13.0,14.0
3,15.0,16.0,17.0,18.0,19.0


In [None]:
#Adding these results in missing values in the locations that don’t overlap:

In [None]:
df1 + df2

Unnamed: 0,a,b,c,d,e
0,0.0,2.0,4.0,6.0,
1,9.0,,13.0,15.0,
2,18.0,20.0,22.0,24.0,
3,,,,,


In [None]:
1/df1

Unnamed: 0,a,b,c,d
0,inf,1.0,0.5,0.333333
1,0.25,0.2,0.166667,0.142857
2,0.125,0.111111,0.1,0.090909


In [None]:
df1.rdiv(1)

Unnamed: 0,a,b,c,d
0,inf,1.0,0.5,0.333333
1,0.25,0.2,0.166667,0.142857
2,0.125,0.111111,0.1,0.090909


In [None]:
#Relatedly, when reindexing a Series or DataFrame, you can also specify a different fill value:

In [None]:
df1.reindex(columns=df2.columns, fill_value=0)

Unnamed: 0,a,b,c,d,e
0,0.0,1.0,2.0,3.0,0
1,4.0,5.0,6.0,7.0,0
2,8.0,9.0,10.0,11.0,0


In [None]:
#Operations between DataFrame and Series

#As with NumPy arrays of different dimensions, arithmetic between DataFrame and
#Series is also defined. First, as a motivating example, consider the difference between
#a two-dimensional array and one of its rows:

In [None]:
arr = np.arange(12.).reshape((3, 4))

In [None]:
arr

array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]])

In [None]:
arr[0]

array([0., 1., 2., 3.])

In [None]:
arr - arr[0]

array([[0., 0., 0., 0.],
       [4., 4., 4., 4.],
       [8., 8., 8., 8.]])

In [None]:
#When we subtract arr[0] from arr, the subtraction is performed once for each row.
#This is referred to as broadcasting and is explained in more detail as it relates to
#general NumPy arrays in Appendix A. Operations between a DataFrame and a Series
#are similar:

In [None]:
frame = pd.DataFrame(np.arange(12.).reshape((4, 3)), columns=list("bde"), index=["Utah", "Ohio", "Texas", "Oregon"])

In [None]:
series = frame.iloc[0]

In [None]:
frame

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


In [None]:
series

Unnamed: 0,Utah
b,0.0
d,1.0
e,2.0


In [None]:
#By default, arithmetic between DataFrame and Series matches the index of the Series
#on the columns of the DataFrame, broadcasting down the rows:

In [None]:
frame - series

Unnamed: 0,b,d,e
Utah,0.0,0.0,0.0
Ohio,3.0,3.0,3.0
Texas,6.0,6.0,6.0
Oregon,9.0,9.0,9.0


In [None]:
#If an index value is not found in either the DataFrame’s columns or the Series’s index,
#the objects will be reindexed to form the union:

In [None]:
series2 = pd.Series(np.arange(3), index=["b", "e", "f"])

In [None]:
series2

Unnamed: 0,0
b,0
e,1
f,2


In [None]:
frame + series2

Unnamed: 0,b,d,e,f
Utah,0.0,,3.0,
Ohio,3.0,,6.0,
Texas,6.0,,9.0,
Oregon,9.0,,12.0,


In [None]:
#If you want to instead broadcast over the columns, matching on the rows, you have to
#use one of the arithmetic methods and specify to match over the index. For example:

In [None]:
series3 = frame["d"]

In [None]:
frame

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


In [None]:
series3

Unnamed: 0,d
Utah,1.0
Ohio,4.0
Texas,7.0
Oregon,10.0


In [None]:
frame.sub(series3, axis="index")

Unnamed: 0,b,d,e
Utah,-1.0,0.0,1.0
Ohio,-1.0,0.0,1.0
Texas,-1.0,0.0,1.0
Oregon,-1.0,0.0,1.0


In [None]:
#The axis that you pass is the axis to match on. In this case we mean to match on the
#DataFrame’s row index (axis="index") and broadcast across the columns.

In [None]:
#Fuction Application and Mapping

In [None]:
#NumPy ufuncs (element-wise array methods) also work with pandas objects:

In [None]:
frame = pd.DataFrame(np.random.standard_normal((4, 3)),
    columns=list("bde"), index=["Utah", "Ohio", "Texas", "Oregon"])

In [None]:
frame

Unnamed: 0,b,d,e
Utah,-0.612354,0.681299,1.344035
Ohio,-1.638351,-0.887647,0.16523
Texas,-1.525378,-0.029098,-1.166085
Oregon,1.464312,1.144394,1.081654


In [None]:
np.abs(frame)

Unnamed: 0,b,d,e
Utah,0.612354,0.681299,1.344035
Ohio,1.638351,0.887647,0.16523
Texas,1.525378,0.029098,1.166085
Oregon,1.464312,1.144394,1.081654


In [None]:
#Another frequent operation is applying a function on one-dimensional arrays to each
#column or row. DataFrame’s apply method does exactly this:

In [None]:
def f1(x):
    return x.max() - x.min()

In [None]:
frame.apply(f1)

Unnamed: 0,0
b,3.102662
d,2.03204
e,2.51012


In [None]:
#Here the function f, which computes the difference between the maximum and
#minimum of a Series, is invoked once on each column in frame. The result is a Series
#having the columns of frame as its index.

#If you pass axis="columns" to apply, the function will be invoked once per row
#instead. A helpful way to think about this is as “apply across the columns”:

In [None]:
frame.apply(f1, axis="columns")

Unnamed: 0,0
Utah,1.956389
Ohio,1.803581
Texas,1.49628
Oregon,0.382657


In [None]:
#Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.

#The function passed to apply need not return a scalar value; it can also return a Series
#with multiple values:

In [None]:
def f2(x):
    return pd.Series([x.min(), x.max()], index=["min", "max"])

In [None]:
frame.apply(f2)

Unnamed: 0,b,d,e
min,-1.638351,-0.887647,-1.166085
max,1.464312,1.144394,1.344035


In [None]:
#Element-wise Python functions can be used, too. Suppose you wanted to compute
#a formatted string from each floating-point value in frame. You can do this with
#applymap:

In [None]:
def my_format(x):
    return f"{x: .2f}"

In [None]:
frame.applymap(my_format)

  frame.applymap(my_format)


Unnamed: 0,b,d,e
Utah,-0.61,0.68,1.34
Ohio,-1.64,-0.89,0.17
Texas,-1.53,-0.03,-1.17
Oregon,1.46,1.14,1.08


In [None]:
#The reason for the name applymap is that Series has a map method for applying an
#element-wise function:

In [None]:
frame["e"].map(my_format)

Unnamed: 0,e
Utah,1.34
Ohio,0.17
Texas,-1.17
Oregon,1.08


In [None]:
#Sorting and Ranking
#Sorting a dataset by some criterion is another important built-in operation. To sort
#lexicographically by row or column label, use the sort_index method, which returns
#a new, sorted object:

In [None]:
obj = pd.Series(np.arange(4), index=["d", "a", "b", "c"])

In [None]:
obj

Unnamed: 0,0
d,0
a,1
b,2
c,3


In [None]:
obj.sort_index()

Unnamed: 0,0
a,1
b,2
c,3
d,0


In [None]:
#With a DataFrame, you can sort by index on either axis:

In [None]:
frame = pd.DataFrame(np.arange(8).reshape((2, 4)), index=["three", "one"], columns=["d", "a", "b", "c"])

In [None]:
frame

Unnamed: 0,d,a,b,c
three,0,1,2,3
one,4,5,6,7


In [None]:
frame.sort_index()

Unnamed: 0,d,a,b,c
one,4,5,6,7
three,0,1,2,3


In [None]:
frame.sort_index(axis="columns")

Unnamed: 0,a,b,c,d
three,1,2,3,0
one,5,6,7,4


In [None]:
#The data is sorted in ascending order by default but can be sorted in descending
#order, too:

In [None]:
frame.sort_index(axis="columns", ascending=False)

Unnamed: 0,d,c,b,a
three,0,3,2,1
one,4,7,6,5


In [None]:
#To sort a Series by its values, use its sort_values method:

In [None]:
obj = pd.Series([4, 7, -3, 2])

In [None]:
obj.sort_values()

Unnamed: 0,0
2,-3
3,2
0,4
1,7


In [None]:
#Any missing values are sorted to the end of the Series by default:

In [None]:
obj = pd.Series([4, np.nan, 7, np.nan, -3, 2])

In [None]:
obj.sort_values()

Unnamed: 0,0
4,-3.0
5,2.0
0,4.0
2,7.0
1,
3,


In [None]:
#Missing values can be sorted to the start instead by using the na_position option:

In [None]:
obj.sort_values(na_position = "first")

Unnamed: 0,0
1,
3,
4,-3.0
5,2.0
0,4.0
2,7.0


In [None]:
#When sorting a DataFrame, you can use the data in one or more columns as the sort
#keys. To do so, pass one or more column names to sort_values:

In [None]:
frame = pd.DataFrame({"b": [4, 7, -3, 2], "a": [0, 1, 0, 1]})

In [None]:
frame

Unnamed: 0,b,a
0,4,0
1,7,1
2,-3,0
3,2,1


In [None]:
frame.sort_values("b")

Unnamed: 0,b,a
2,-3,0
3,2,1
0,4,0
1,7,1


In [None]:
#To sort by multiple columns, pass a list of names

In [None]:
frame.sort_values(["a", "b"])

Unnamed: 0,b,a
2,-3,0
0,4,0
3,2,1
1,7,1


In [None]:
#Ranking assigns ranks from one through the number of valid data points in an array,
#starting from the lowest value. The rank methods for Series and DataFrame are the
#place to look; by default, rank breaks ties by assigning each group the mean rank:

In [None]:
obj = pd.Series([8, -5, 7, 4, 2, 0, 4])

In [None]:
obj.rank()

Unnamed: 0,0
0,7.0
1,1.0
2,6.0
3,4.5
4,3.0
5,2.0
6,4.5


In [None]:
#Ranks can also be assigned according to the order in which they're observed in the data:

In [None]:
obj.rank(method="first")

Unnamed: 0,0
0,7.0
1,1.0
2,6.0
3,4.0
4,3.0
5,2.0
6,5.0


In [None]:
#Here, instead of using the average rank 6.5 for the entries 0 and 2, they instead have
#been set to 6 and 7 because label 0 precedes label 2 in the data.

#You can rank in descending order, too:

In [None]:
obj.rank(ascending=False)

Unnamed: 0,0
0,1.0
1,7.0
2,2.0
3,3.5
4,5.0
5,6.0
6,3.5


In [None]:
#DataFrame can compute ranks over the rows or the columns:

In [None]:
frame = pd.DataFrame({"b": [4.3, 7, -3, 2], "a": [0, 1, 0, 1],
  "c": [-2, 5, 8, -2.5]})

In [None]:
frame

Unnamed: 0,b,a,c
0,4.3,0,-2.0
1,7.0,1,5.0
2,-3.0,0,8.0
3,2.0,1,-2.5


In [None]:
frame.rank(axis="columns")

Unnamed: 0,b,a,c
0,3.0,2.0,1.0
1,3.0,1.0,2.0
2,1.0,2.0,3.0
3,3.0,2.0,1.0


In [None]:
#Axis Indexes with Duplicate Labels
#Up until now almost all of the examples we have looked at have unique axis labels
#(index values). While many pandas functions (like reindex) require that the labels be
#unique, it’s not mandatory. Let’s consider a small Series with duplicate indices:

In [None]:
obj = pd.Series(np.arange(5), index=["a", "a", "b", "b", "c"])

In [None]:
obj

Unnamed: 0,0
a,0
a,1
b,2
b,3
c,4


In [None]:
#The is_unique property of the index can tell you whether or not its labels are unique:

In [None]:
obj.index.is_unique

False

In [None]:
#Data selection is one of the main things that behaves differently with duplicates.
#Indexing a label with multiple entries returns a Series, while single entries return a
#scalar value:

In [None]:
obj.loc["a"]

Unnamed: 0,0
a,0
a,1


In [None]:
obj.loc["c"]

np.int64(4)

In [None]:
#This can make your code more complicated, as the output type from indexing can
#vary based on whether or not a label is repeated.

#The same logic extends to indexing rows (or columns) in a DataFrame:

In [None]:
df = pd.DataFrame(np.random.standard_normal((5, 3)), index=["a", "a", "b", "b", "c"])

In [None]:
df

Unnamed: 0,0,1,2
a,-1.660842,-0.920176,0.775826
a,-0.06605,-3.45902,-0.551572
b,0.891979,0.829131,0.649216
b,-0.496242,-1.033814,-0.233162
c,-1.343429,0.777418,-0.364439


In [None]:
df.loc["b"]

Unnamed: 0,0,1,2
b,0.891979,0.829131,0.649216
b,-0.496242,-1.033814,-0.233162


In [None]:
df.loc["c"]

Unnamed: 0,c
0,-1.343429
1,0.777418
2,-0.364439


In [None]:
#5.3 Summarising and Computing Descriptive Statistics