---
<center><h1>Basic intro into pandas</h1></center> 

<center><h2>Working with pandas DataFrames: main operations, sorting and selecting by type</h2></center>

---

## Table of Contents
- [Work with pandas DataFrames: main operations, sorting and selecting by type](#Work-with-pandas-DataFrames:-main-operations,-sorting-and-selecting-by-type)
    * [Flexible comparisons and boolean reductions](#Flexible-comparisons-and-boolean-reductions)
    * [Descriptive statistics](#Descriptive-statistics)
    * [Function application](#Function-application)
    * [Sorting](#Sorting)
    * [Selecting by type](#Selecting-by-type)

In [3]:
import pandas as pd
import numpy as np
import random

## Work with pandas DataFrames: main operations, sorting and selecting by type

[[back to top]](#Table-of-Contents)

In this part we will consider the following questions:
1.	how quickly compare two or more DataFrames or check if Dataframe’s items satisfy any condition.
2.	what main mathematical (computational) and statistical operations may be easily applied to pandas DataFrame's data, i.e. what such operations are build in pandas; 
3.	how to apply an arbitrary function to DataFrame’s items, rows, columns and whole DataFrame and change its data type;
4.	how sort rows and columns data;
5.	how select any column by its type.

At first, let’s find all unique values in `‘Province_State’` column of the COVID 19 dataset

In [62]:
from arcgis.features import GeoAccessor, GeoSeriesAccessor
#Import a COVID data layer.  This layer contains the updated stats for each county in the United States
from arcgis.features import FeatureLayer
mylayer = FeatureLayer(("https://services1.arcgis.com/0MSEUqKaxRlEPj5g/ArcGIS/rest/services/ncov_cases_US/FeatureServer/0"))
sdf2 = pd.DataFrame.spatial.from_layer(mylayer)
sdf2.head(4)

Unnamed: 0,OBJECTID,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Recovered,Deaths,Active,Admin2,FIPS,Combined_Key,Incident_Rate,People_Tested,People_Hospitalized,UID,ISO3,SHAPE
0,1,Alabama,US,2021-02-24 04:24:54,32.539527,-86.644082,6143,0,84,6059,Autauga,1001,"Autauga, Alabama, US",10995.364155,,,84001001,USA,"{""x"": -86.64408226999996, ""y"": 32.539527450000..."
1,2,Alabama,US,2021-02-24 04:24:54,30.72775,-87.722071,19554,0,263,19291,Baldwin,1003,"Baldwin, Alabama, US",8759.418368,,,84001003,USA,"{""x"": -87.72207057999998, ""y"": 30.727749910000..."
2,3,Alabama,US,2021-02-24 04:24:54,31.868263,-85.387129,2084,0,50,2034,Barbour,1005,"Barbour, Alabama, US",8442.031921,,,84001005,USA,"{""x"": -85.38712859999998, ""y"": 31.868263000000..."
3,4,Alabama,US,2021-02-24 04:24:54,32.996421,-87.125115,2432,0,59,2373,Bibb,1007,"Bibb, Alabama, US",10860.0518,,,84001007,USA,"{""x"": -87.12511459999996, ""y"": 32.996420640000..."


In [63]:
sdf2.dtypes

OBJECTID                        int64
Province_State                 object
Country_Region                 object
Last_Update            datetime64[ns]
Lat                           float64
Long_                         float64
Confirmed                       int64
Recovered                       int64
Deaths                          int64
Active                          int64
Admin2                         object
FIPS                           object
Combined_Key                   object
Incident_Rate                 float64
People_Tested                  object
People_Hospitalized            object
UID                             int64
ISO3                           object
SHAPE                        geometry
dtype: object

In [64]:
# get unique values
unique_states = sdf2['Province_State'].drop_duplicates().dropna()
unique_states

0                    Alabama
69                    Alaska
98                   Arizona
115                 Arkansas
191               California
250                 Colorado
316              Connecticut
325                 Delaware
329     District of Columbia
330                  Florida
398                  Georgia
559                   Hawaii
565                    Idaho
610                 Illinois
714                  Indiana
807                     Iowa
907                   Kansas
1014                Kentucky
1135               Louisiana
1201                   Maine
1219                Maryland
1244           Massachusetts
1258                Michigan
1345               Minnesota
1433             Mississippi
1516                Missouri
1633                 Montana
1690                Nebraska
1784                  Nevada
1802           New Hampshire
1813              New Jersey
1835              New Mexico
1869                New York
1933          North Carolina
2034          

Above we have used `drop_duplicates()` method to select only `unique` Series values. 
Below we can filter the same data frame with respect to `specific states` using isin.

In [65]:
SouthCentraldf = sdf2[sdf2['Province_State'].isin(["Mississippi", "Alabama", "Louisiana"])]
SouthCentraldf

Unnamed: 0,OBJECTID,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Recovered,Deaths,Active,Admin2,FIPS,Combined_Key,Incident_Rate,People_Tested,People_Hospitalized,UID,ISO3,SHAPE
0,1,Alabama,US,2021-02-24 04:24:54,32.539527,-86.644082,6143,0,84,6059,Autauga,01001,"Autauga, Alabama, US",10995.364155,,,84001001,USA,"{""x"": -86.64408226999996, ""y"": 32.539527450000..."
1,2,Alabama,US,2021-02-24 04:24:54,30.727750,-87.722071,19554,0,263,19291,Baldwin,01003,"Baldwin, Alabama, US",8759.418368,,,84001003,USA,"{""x"": -87.72207057999998, ""y"": 30.727749910000..."
2,3,Alabama,US,2021-02-24 04:24:54,31.868263,-85.387129,2084,0,50,2034,Barbour,01005,"Barbour, Alabama, US",8442.031921,,,84001005,USA,"{""x"": -85.38712859999998, ""y"": 31.868263000000..."
3,4,Alabama,US,2021-02-24 04:24:54,32.996421,-87.125115,2432,0,59,2373,Bibb,01007,"Bibb, Alabama, US",10860.051800,,,84001007,USA,"{""x"": -87.12511459999996, ""y"": 32.996420640000..."
4,5,Alabama,US,2021-02-24 04:24:54,33.982109,-86.567906,6058,0,125,5933,Blount,01009,"Blount, Alabama, US",10476.256355,,,84001009,USA,"{""x"": -86.56790592999994, ""y"": 33.982109180000..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1511,1512,Mississippi,US,2021-02-24 04:24:54,33.613005,-89.283929,1086,0,29,1057,Webster,28155,"Webster, Mississippi, US",11208.587057,,,84028155,USA,"{""x"": -89.28392911999998, ""y"": 33.613004860000..."
1512,1513,Mississippi,US,2021-02-24 04:24:54,31.160782,-91.310188,622,0,26,596,Wilkinson,28157,"Wilkinson, Mississippi, US",7207.415991,,,84028157,USA,"{""x"": -91.31018818999996, ""y"": 31.160782250000..."
1513,1514,Mississippi,US,2021-02-24 04:24:54,33.087479,-89.033914,2201,0,74,2127,Winston,28159,"Winston, Mississippi, US",12258.423837,,,84028159,USA,"{""x"": -89.03391384999998, ""y"": 33.087479080000..."
1514,1515,Mississippi,US,2021-02-24 04:24:54,34.028242,-89.707621,1419,0,36,1383,Yalobusha,28161,"Yalobusha, Mississippi, US",11719.524281,,,84028161,USA,"{""x"": -89.70762049999996, ""y"": 34.028241750000..."


Now we may filter the resulting DataFrame `SouthCentraldf` by `deaths`

In [66]:
lowDeathsSouthCentraldf = SouthCentraldf[SouthCentraldf['Deaths'] < 50]
lowDeathsSouthCentraldf

Unnamed: 0,OBJECTID,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Recovered,Deaths,Active,Admin2,FIPS,Combined_Key,Incident_Rate,People_Tested,People_Hospitalized,UID,ISO3,SHAPE
5,6,Alabama,US,2021-02-24 04:24:54,32.100305,-85.712655,1160,0,33,1127,Bullock,01011,"Bullock, Alabama, US",11484.011484,,,84001011,USA,"{""x"": -85.71265534999998, ""y"": 32.100305330000..."
9,10,Alabama,US,2021-02-24 04:24:54,34.178060,-85.606390,1757,0,37,1720,Cherokee,01019,"Cherokee, Alabama, US",6707.130860,,,84001019,USA,"{""x"": -85.60638967999995, ""y"": 34.178059830000..."
11,12,Alabama,US,2021-02-24 04:24:54,32.022273,-88.265644,547,0,23,524,Choctaw,01023,"Choctaw, Alabama, US",4345.063150,,,84001023,USA,"{""x"": -88.26564429999996, ""y"": 32.022273410000..."
12,13,Alabama,US,2021-02-24 04:24:54,31.680999,-87.835486,3423,0,48,3375,Clarke,01025,"Clarke, Alabama, US",14490.728981,,,84001025,USA,"{""x"": -87.83548596999998, ""y"": 31.680998590000..."
14,15,Alabama,US,2021-02-24 04:24:54,33.676792,-85.520059,1361,0,39,1322,Cleburne,01029,"Cleburne, Alabama, US",9128.101945,,,84001029,USA,"{""x"": -85.52005898999994, ""y"": 33.676792040000..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1507,1508,Mississippi,US,2021-02-24 04:24:54,31.149715,-90.104467,1230,0,40,1190,Walthall,28147,"Walthall, Mississippi, US",8609.827803,,,84028147,USA,"{""x"": -90.10446653999998, ""y"": 31.149715480000..."
1510,1511,Mississippi,US,2021-02-24 04:24:54,31.641357,-88.695739,2501,0,40,2461,Wayne,28153,"Wayne, Mississippi, US",12391.616707,,,84028153,USA,"{""x"": -88.69573912999994, ""y"": 31.641356870000..."
1511,1512,Mississippi,US,2021-02-24 04:24:54,33.613005,-89.283929,1086,0,29,1057,Webster,28155,"Webster, Mississippi, US",11208.587057,,,84028155,USA,"{""x"": -89.28392911999998, ""y"": 33.613004860000..."
1512,1513,Mississippi,US,2021-02-24 04:24:54,31.160782,-91.310188,622,0,26,596,Wilkinson,28157,"Wilkinson, Mississippi, US",7207.415991,,,84028157,USA,"{""x"": -91.31018818999996, ""y"": 31.160782250000..."


We are going to take a break from real data and just show some examples using made up data.  These examples basically compare different data frames (or series).

In [67]:
df_ABC = pd.DataFrame({'A': [1,2,3], 'B': [3,4,5], 'C': [-1,9,-4]})
df_ABC

Unnamed: 0,A,B,C
0,1,3,-1
1,2,4,9
2,3,5,-4


In [68]:
df_ACD = pd.DataFrame({'A': [0,4,9], 'C': [-1,-3,-2], 'D': [0,1,-2]})
df_ACD

Unnamed: 0,A,C,D
0,0,-1,0
1,4,-3,1
2,9,-2,-2


In [69]:
df_ABC.le(df_ACD)

Unnamed: 0,A,B,C,D
0,False,False,True,False
1,True,False,False,False
2,True,False,True,False


As was mentioned above pandas compare elements from the same row and column. 

You can also apply the reductions: `empty`, `any()`, `all()`, and `bool()` to provide a way to summarize a boolean result:

In [70]:
# here vertical direction for comparison is taking into account and we check all column’s items
(df_ACD < 0).all()

A    False
C     True
D    False
dtype: bool

In [71]:
# here horizontal direction for comparison is taking into account and we check all row’s items
(df_ACD < 0).all(axis=1)

0    False
1    False
2    False
dtype: bool

In [72]:
# here vertical direction for comparison is taking into 
# account and we check if just one column’s item satisfies the condition
(df_ACD < 0).any()

A    False
C     True
D     True
dtype: bool

In [73]:
# here we check if all DataFrame's items satisfy the condition
(df_ACD < 0).any().any()

True

In [74]:
# here we check if DataFrame is empty (no elements)
df_ACD.empty

False

Based on the provided above way you can determine the necessary columns with respect to any condition. It’s helpful when need to quickly check if a DataFrame or its some row or columns contain, for instance, all positive values but it does not matter exactly what the elements – it is the main difference between filtering and flexible comparisons.  Remember you can reverse a boolean condition by using the not keyword.

### Descriptive statistics

[[back to top]](#Table-of-Contents)

pandas provides a large number of methods for computing descriptive statistics and other related mathematical operations on Series and DataFrame. Most of these are aggregations but some of them produce an object of the same size. Most of these functions are collected in summary table of common functions:

|Function|Description|
|--|-------------------------------|
|abs|absolute value|
|count|number of non-null observations|
|sum|sum of values|
|mean|mean of values|
|mad|mean absolute deviation|
|median|arithmetic median of values|
|min|minimum value|
|max|maximum value|
|idxmin|position of minimum value|
|idxmax|position of maximum value|
|mode|mode|
|prod|product of values|
|std|unbiased standard deviation|
|var|unbiased variance|
|cumsum|cumulative sum (a sequence of partial sums of a given sequence)|

Let’s demonstrate how you can use these methods:

In [75]:
sdf2['Deaths'].sum()

502434

In [76]:
sdf2['Deaths'].mean()

153.69654328540838

In [84]:
# returns average value for each numerical column  (in scientific notation)
sdf2.mean()

OBJECTID               1.635000e+03
Lat                    3.795721e+01
Long_                 -9.153506e+01
Confirmed              8.641443e+03
Recovered              0.000000e+00
Deaths                 1.536965e+02
Active                 8.487746e+03
Incident_Rate          8.741577e+03
People_Tested                   NaN
People_Hospitalized             NaN
UID                    8.351861e+07
dtype: float64

In [85]:
# average value for all DataFrame  No real meaning in this case
sdf2.mean().mean()

9282912.65028847

In [86]:
sdf2['Deaths'].max(), sdf2['Deaths'].idxmax()

(20057, 209)

In [94]:
#Which County has had the most deaths?
sdf2.iloc[209]

OBJECTID                                                             210
Province_State                                                California
Country_Region                                                        US
Last_Update                                          2021-02-24 04:24:54
Lat                                                            34.308284
Long_                                                        -118.228241
Confirmed                                                        1183378
Recovered                                                              0
Deaths                                                             20057
Active                                                           1163321
Admin2                                                       Los Angeles
FIPS                                                               06037
Combined_Key                                 Los Angeles, California, US
Incident_Rate                                      

### Function application

[[back to top]](#Table-of-Contents)

pandas allows to apply your own or some library’s function to pandas objects (particularly, Series and DataFrame). If you need to apply any function to DataFrame row or column you may use the function apply. When you need to make something transformations with some column’s or row’s elements, then method `map` will be helpful (it works like pure Python function `map()`). But there is also possibility to apply some function to each DataFrame element (not to a column or a row) – method `applymap` comes to the aid in this case.

For instance, we could find the average value of each numerical column of `sdf2` DataFrame in such way

In [95]:
sdf2.loc[:, (sdf2.dtypes == np.int64) | (sdf2.dtypes == np.float64)].apply(np.mean)

OBJECTID         1.635000e+03
Lat              3.795721e+01
Long_           -9.153506e+01
Confirmed        8.641443e+03
Recovered        0.000000e+00
Deaths           1.536965e+02
Active           8.487746e+03
Incident_Rate    8.741577e+03
UID              8.351861e+07
dtype: float64

or of each row (let’s remind the attribute axis define the horizontal `(axis=1)` or vertical direction for calculations `(axis=0)`)

In [96]:
sdf2.loc[:, (sdf2.dtypes == np.int64) | (sdf2.dtypes == np.float64)]. \
                 apply(np.mean, axis=1).head(10)

0    9.336025e+06
1    9.338757e+06
2    9.334841e+06
3    9.335187e+06
4    9.335950e+06
5    9.334974e+06
6    9.334986e+06
7    9.337622e+06
8    9.335323e+06
9    9.334578e+06
dtype: float64

or find the absolute value of the difference between maximal and minimal values multiplied by elements amount in the corresponding row

In [97]:
sdf2.loc[:, (sdf2.dtypes == np.int64) | (sdf2.dtypes == np.float64)].apply(lambda x: abs(x.max() - x.min())*x.count())

OBJECTID         1.068309e+07
Lat              1.643662e+05
Long_            3.486064e+05
Confirmed        3.868463e+09
Recovered        0.000000e+00
Deaths           6.556633e+07
Active           3.810082e+09
Incident_Rate    1.036516e+08
UID              6.870802e+10
dtype: float64

You can also apply any your own function set before using method `apply`

In [103]:
def my_own_func(x, power, delta=0):
    if x < 20:
        return (x - delta)**power
    elif x >= 20:
        return round(power/x, 2)
    else:
        return  np.nan
    
sdf2['Deaths'].apply(my_own_func, args=(2,), delta=1).head(10)

0    0.02
1    0.01
2    0.04
3    0.03
4    0.02
5    0.06
6    0.03
7    0.01
8    0.02
9    0.05
Name: Deaths, dtype: float64

where the first argument of `apply` method is the function name, the second are `tuple` of all variables without default values, the follow all variables with default values.

To apply any function to each Series element (row or column of a DataFrame) you may use method `map` (please see the type of `'age'` column before; do you remember how it can be done?)

In [104]:
# get 'Deaths' column where NaN replaced by 0
sdf2['Deaths'].map(lambda x: int(x) if pd.notnull(x) else 0).head(10)

0     84
1    263
2     50
3     59
4    125
5     33
6     65
7    281
8    102
9     37
Name: Deaths, dtype: int64

In [105]:
sdf2['Deaths'].fillna(0).astype(int).head(10)

0     84
1    263
2     50
3     59
4    125
5     33
6     65
7    281
8    102
9     37
Name: Deaths, dtype: int32

Here we have used method `astype()` to change type of column’s elements. But why we have written `fillna(0)`?

### Sorting

[[back to top]](#Table-of-Contents)

pandas functionality proposes two kinds of very fast sorting: sorting by label using `sort_index()` and sorting by actual values `order()` for Series and `sort()` for DataFrame. Let’s note that both sorting procedures don’t return a new object by default, except by passing attribute `inplace=True`. For applying of `sort()` method to a DataFrame you should set an arbitrary vector or a column name of the DataFrame to determine the sort order. Otherwise `sort()` works as well as `sort_index()`. By default pandas return an object in ascending order. For changing it to descending order you should set attribute `ascending=False`.


In [106]:
sdf2.sort_index().head(10)

Unnamed: 0,OBJECTID,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Recovered,Deaths,Active,Admin2,FIPS,Combined_Key,Incident_Rate,People_Tested,People_Hospitalized,UID,ISO3,SHAPE
0,1,Alabama,US,2021-02-24 04:24:54,32.539527,-86.644082,6143,0,84,6059,Autauga,1001,"Autauga, Alabama, US",10995.364155,,,84001001,USA,"{""x"": -86.64408226999996, ""y"": 32.539527450000..."
1,2,Alabama,US,2021-02-24 04:24:54,30.72775,-87.722071,19554,0,263,19291,Baldwin,1003,"Baldwin, Alabama, US",8759.418368,,,84001003,USA,"{""x"": -87.72207057999998, ""y"": 30.727749910000..."
2,3,Alabama,US,2021-02-24 04:24:54,31.868263,-85.387129,2084,0,50,2034,Barbour,1005,"Barbour, Alabama, US",8442.031921,,,84001005,USA,"{""x"": -85.38712859999998, ""y"": 31.868263000000..."
3,4,Alabama,US,2021-02-24 04:24:54,32.996421,-87.125115,2432,0,59,2373,Bibb,1007,"Bibb, Alabama, US",10860.0518,,,84001007,USA,"{""x"": -87.12511459999996, ""y"": 32.996420640000..."
4,5,Alabama,US,2021-02-24 04:24:54,33.982109,-86.567906,6058,0,125,5933,Blount,1009,"Blount, Alabama, US",10476.256355,,,84001009,USA,"{""x"": -86.56790592999994, ""y"": 33.982109180000..."
5,6,Alabama,US,2021-02-24 04:24:54,32.100305,-85.712655,1160,0,33,1127,Bullock,1011,"Bullock, Alabama, US",11484.011484,,,84001011,USA,"{""x"": -85.71265534999998, ""y"": 32.100305330000..."
6,7,Alabama,US,2021-02-24 04:24:54,31.753001,-86.680575,1948,0,65,1883,Butler,1013,"Butler, Alabama, US",10016.454134,,,84001013,USA,"{""x"": -86.68057477999997, ""y"": 31.753000950000..."
7,8,Alabama,US,2021-02-24 04:24:54,33.774837,-85.826304,13063,0,281,12782,Calhoun,1015,"Calhoun, Alabama, US",11498.613617,,,84001015,USA,"{""x"": -85.82630385999994, ""y"": 33.774837270000..."
8,9,Alabama,US,2021-02-24 04:24:54,32.913601,-85.390727,3382,0,102,3280,Chambers,1017,"Chambers, Alabama, US",10170.205088,,,84001017,USA,"{""x"": -85.39072748999996, ""y"": 32.913600790000..."
9,10,Alabama,US,2021-02-24 04:24:54,34.17806,-85.60639,1757,0,37,1720,Cherokee,1019,"Cherokee, Alabama, US",6707.13086,,,84001019,USA,"{""x"": -85.60638967999995, ""y"": 34.178059830000..."


In [107]:
sdf2.sort_index(axis=1).sort_index(ascending=False).head(10)

Unnamed: 0,Active,Admin2,Combined_Key,Confirmed,Country_Region,Deaths,FIPS,ISO3,Incident_Rate,Last_Update,Lat,Long_,OBJECTID,People_Hospitalized,People_Tested,Province_State,Recovered,SHAPE,UID
3268,616,Weston,"Weston, Wyoming, US",621,US,5,56045,USA,8964.919879,2021-02-24 04:24:54,43.839612,-104.567488,3269,,,Wyoming,0,"{""x"": -104.56748809999999, ""y"": 43.83961191000...",84056045
3267,854,Washakie,"Washakie, Wyoming, US",880,US,26,56043,USA,11274.823831,2021-02-24 04:24:54,43.904516,-107.680187,3268,,,Wyoming,0,"{""x"": -107.68018699999999, ""y"": 43.90451606000...",84056043
3266,0,Unassigned,"Unassigned, Wyoming, US",0,US,0,90056,USA,,2021-02-24 04:24:54,,,3267,,,Wyoming,0,,84090056
3265,2024,Uinta,"Uinta, Wyoming, US",2036,US,12,56041,USA,10066.25136,2021-02-24 04:24:54,41.287818,-110.547578,3266,,,Wyoming,0,"{""x"": -110.54757819999998, ""y"": 41.28781830000...",84056041
3264,3324,Teton,"Teton, Wyoming, US",3333,US,9,56039,USA,14204.739175,2021-02-24 04:24:54,43.935225,-110.58908,3265,,,Wyoming,0,"{""x"": -110.58908009999999, ""y"": 43.93522482000...",84056039
3263,3665,Sweetwater,"Sweetwater, Wyoming, US",3699,US,34,56037,USA,8735.800487,2021-02-24 04:24:54,41.659439,-108.882788,3264,,,Wyoming,0,"{""x"": -108.8827882, ""y"": 41.659438960000045, ""...",84056037
3262,664,Sublette,"Sublette, Wyoming, US",671,US,7,56035,USA,6825.348388,2021-02-24 04:24:54,42.765583,-109.913092,3263,,,Wyoming,0,"{""x"": -109.91309219999994, ""y"": 42.76558279000...",84056035
3261,2972,Sheridan,"Sheridan, Wyoming, US",2999,US,27,56033,USA,9837.625062,2021-02-24 04:24:54,44.790489,-106.886239,3262,,,Wyoming,0,"{""x"": -106.88623889999997, ""y"": 44.79048913000...",84056033
3260,572,Platte,"Platte, Wyoming, US",583,US,11,56031,USA,6946.264744,2021-02-24 04:24:54,42.132991,-104.966331,3261,,,Wyoming,0,"{""x"": -104.96633099999997, ""y"": 42.13299116000...",84056031
3259,2577,Park,"Park, Wyoming, US",2603,US,26,56029,USA,8916.215661,2021-02-24 04:24:54,44.521575,-109.585283,3260,,,Wyoming,0,"{""x"": -109.58528249999995, ""y"": 44.52157546000...",84056029


In [110]:
#sort by 'user_id' and 'movie_id' with 'movie_id' ascending with NaNs in beginning
sdf2.sort_values(['Deaths', 'Active'], ascending=[0, 0], na_position='first').head(10)

Unnamed: 0,OBJECTID,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Recovered,Deaths,Active,Admin2,FIPS,Combined_Key,Incident_Rate,People_Tested,People_Hospitalized,UID,ISO3,SHAPE
209,210,California,US,2021-02-24 04:24:54,34.308284,-118.228241,1183378,0,20057,1163321,Los Angeles,6037,"Los Angeles, California, US",11787.681912,,,84006037,USA,"{""x"": -118.22824109999999, ""y"": 34.30828379000..."
625,626,Illinois,US,2021-02-24 04:24:54,41.841448,-87.816588,470933,0,9291,461642,Cook,17031,"Cook, Illinois, US",9143.916401,,,84017031,USA,"{""x"": -87.81658793999998, ""y"": 41.841448490000..."
1892,1893,New York,US,2021-02-24 04:24:54,40.636182,-73.949356,203458,0,8973,194485,Kings,36047,"Kings, New York, US",7947.879275,,,84036047,USA,"{""x"": -73.94935551999998, ""y"": 40.636182500000..."
105,106,Arizona,US,2021-02-24 04:24:54,33.348359,-112.491815,506662,0,8909,497753,Maricopa,4013,"Maricopa, Arizona, US",11295.768908,,,84004013,USA,"{""x"": -112.49181539999995, ""y"": 33.34835867000..."
1910,1911,New York,US,2021-02-24 04:24:54,40.710881,-73.816847,205182,0,8650,196532,Queens,36081,"Queens, New York, US",9103.590377,,,84036081,USA,"{""x"": -73.81684711999998, ""y"": 40.710881240000..."
1871,1872,New York,US,2021-02-24 04:24:54,40.852093,-73.862828,139731,0,5796,133935,Bronx,36005,"Bronx, New York, US",9852.651975,,,84036005,USA,"{""x"": -73.86282754999996, ""y"": 40.852093010000..."
372,373,Florida,US,2021-02-24 04:24:54,25.611236,-80.551706,404499,0,5338,399161,Miami-Dade,12086,"Miami-Dade, Florida, US",14888.035805,,,84012086,USA,"{""x"": -80.55170586999998, ""y"": 25.611236200000..."
2758,2759,Texas,US,2021-02-24 04:24:54,29.858649,-95.393395,343573,0,4884,338689,Harris,48201,"Harris, Texas, US",7289.397612,,,84048201,USA,"{""x"": -95.39339520999994, ""y"": 29.858649390000..."
1343,1344,Michigan,US,2021-02-24 04:24:54,42.280984,-83.281255,100878,0,4124,96754,Wayne,26163,"Wayne, Michigan, US",5766.622098,,,84026163,USA,"{""x"": -83.28125499999999, ""y"": 42.280984050000..."
220,221,California,US,2021-02-24 04:24:54,33.701475,-117.7646,259857,0,3848,256009,Orange,6059,"Orange, California, US",8182.689001,,,84006059,USA,"{""x"": -117.76459979999998, ""y"": 33.70147516000..."


Here the first argument represent `list` of DataFrame’s columns, the seconds one denotes sorting order for corresponding column and the last one defines the position where null values will be placed. 

And let’s give an example of Series sorting:

In [111]:
sdf2['Combined_Key'].sort_values()

2447    Abbeville, South Carolina, US
1135            Acadia, Louisiana, US
2942           Accomack, Virginia, US
565                    Ada, Idaho, US
807                   Adair, Iowa, US
                    ...              
114                 Yuma, Arizona, US
315                Yuma, Colorado, US
2911                Zapata, Texas, US
2912                Zavala, Texas, US
2560        Ziebach, South Dakota, US
Name: Combined_Key, Length: 3269, dtype: object

Let’s note that previous pandas versions (before 0.17.0) contain other method for sorting by values: `sort_values(inplace=True)` for Series and `sort_values(by=[“column’s name”])` for DataFrame.

It is important to note that Series has the `nsmallest()` and `nlargest()` methods which return the smallest or largest `n` values. For a large Series this can be much faster than sorting the entire Series and calling `head(n)` on the result.


In [112]:
sdf2['Deaths'].nlargest(3)

209     20057
625      9291
1892     8973
Name: Deaths, dtype: int64

In [117]:
sdf2['Deaths'].nsmallest(5)

52    0
64    0
70    0
73    0
74    0
Name: Deaths, dtype: int64

### Selecting by type

[[back to top]](#Table-of-Contents)

You already know how to see types of each column of a DataFrame (with the help of `dtypes`, for example) and how to change type of any DataFrames’s column or row (by using `astype()` method). But what to do if you need to select a specific column of a certain type? Method `select_dtypes()` makes this issue very easy. Let’s create a DataFrame with data of many different types to demonstrate its work rather than use one of the provided datasets.


In [118]:
import datetime
types_df = pd.DataFrame({  'int': list(range(3)),
                           'float': [1.1, 2.2, 3.3],
                           'bool': [False, True, False],
                           'string': list('abc'),
                           'undefined': [2>1, pd.isnull(np.inf),isinstance([],list)],
                           'shuffled': [datetime.datetime.now(), [np.nan, np.inf], type('A')],
                           'date': pd.date_range('20151120', periods=3).values
                        })
types_df

Unnamed: 0,int,float,bool,string,undefined,shuffled,date
0,0,1.1,False,a,True,2021-02-24 08:32:38.137314,2015-11-20
1,1,2.2,True,b,False,"[nan, inf]",2015-11-21
2,2,3.3,False,c,True,<class 'str'>,2015-11-22


In [119]:
types_df.dtypes

int                   int64
float               float64
bool                   bool
string               object
undefined              bool
shuffled             object
date         datetime64[ns]
dtype: object

Pay attention that pandas defines Python type str as type object. 

Let’s select only boolean columns


In [120]:
types_df.select_dtypes(include=['bool'])   
# or types_df.select_dtypes(include=[bool])

Unnamed: 0,bool,undefined
0,False,True
1,True,False
2,False,True


or remain all columns which are have no bool or object types

In [121]:
types_df.select_dtypes(exclude=['bool', 'object']) 
# or types_df.select_dtypes(include=['datetime64[ns]','float64', 'int64'])


Unnamed: 0,int,float,date
0,0,1.1,2015-11-20
1,1,2.2,2015-11-21
2,2,3.3,2015-11-22
