# Basic Rows Analysis

The basic rows analysis includes the following:

1. Selecting specific rows
2. Skipping rows in a dataframe
3. Handling index in a dataframe
4. Selecting specific data (Sub-Setting)
5. Selecting data with `loc` and `iloc`
6. Selecting data with `at` and `iat`

In [1]:
import pandas as pd

data = pd.read_csv('../00_Datasets/Demographic-Data.csv')

## Selecting Specific Rows

In [2]:
# Selecting 1st to 4th row (first row index =0):

data[1:5]

Unnamed: 0,Country,CountryCode,BirthRate,InternetUsers,IncomeGroup
1,Afghanistan,AFG,35.253,5.9,Low income
2,Angola,AGO,45.985,19.1,Upper middle income
3,Albania,ALB,12.877,57.2,Upper middle income
4,United Arab Emirates,ARE,11.044,88.0,High income


In [3]:
# Selecting all rows after 100th:

data[100:]

Unnamed: 0,Country,CountryCode,BirthRate,InternetUsers,IncomeGroup
100,Libya,LBY,21.425,16.5,Upper middle income
101,St. Lucia,LCA,15.430,46.2,Upper middle income
102,Liechtenstein,LIE,9.200,93.8,High income
103,Sri Lanka,LKA,17.863,21.9,Lower middle income
104,Lesotho,LSO,28.738,5.0,Lower middle income
...,...,...,...,...,...
190,"Yemen, Rep.",YEM,32.947,20.0,Lower middle income
191,South Africa,ZAF,20.850,46.5,Upper middle income
192,"Congo, Dem. Rep.",COD,42.394,2.2,Low income
193,Zambia,ZMB,40.471,15.4,Lower middle income


In [4]:
# selecting all rows before 100:

data[:101]

Unnamed: 0,Country,CountryCode,BirthRate,InternetUsers,IncomeGroup
0,Aruba,ABW,10.244,78.90,High income
1,Afghanistan,AFG,35.253,5.90,Low income
2,Angola,AGO,45.985,19.10,Upper middle income
3,Albania,ALB,12.877,57.20,Upper middle income
4,United Arab Emirates,ARE,11.044,88.00,High income
...,...,...,...,...,...
96,Kuwait,KWT,20.575,75.46,High income
97,Lao PDR,LAO,27.051,12.50,Lower middle income
98,Lebanon,LBN,13.426,70.50,Upper middle income
99,Liberia,LBR,35.521,3.20,Low income


In [5]:
# skipping 5 rows each:

data[::5]

Unnamed: 0,Country,CountryCode,BirthRate,InternetUsers,IncomeGroup
0,Aruba,ABW,10.244,78.9,High income
5,Argentina,ARG,17.716,59.9,High income
10,Azerbaijan,AZE,18.3,58.7,Upper middle income
15,Bangladesh,BGD,20.142,6.63,Lower middle income
20,Belarus,BLR,12.5,54.17,Upper middle income
25,Barbados,BRB,12.188,73.0,High income
30,Canada,CAN,10.9,85.8,High income
35,Cameroon,CMR,37.236,6.4,Lower middle income
40,Costa Rica,CRI,15.022,45.96,Upper middle income
45,Germany,DEU,8.5,84.17,High income


In [6]:
# select rows from 10 to 100 with 5 skip interval:

data[10:101:5]

Unnamed: 0,Country,CountryCode,BirthRate,InternetUsers,IncomeGroup
10,Azerbaijan,AZE,18.3,58.7,Upper middle income
15,Bangladesh,BGD,20.142,6.63,Lower middle income
20,Belarus,BLR,12.5,54.17,Upper middle income
25,Barbados,BRB,12.188,73.0,High income
30,Canada,CAN,10.9,85.8,High income
35,Cameroon,CMR,37.236,6.4,Lower middle income
40,Costa Rica,CRI,15.022,45.96,Upper middle income
45,Germany,DEU,8.5,84.17,High income
50,Ecuador,ECU,21.07,40.353684,Upper middle income
55,Ethiopia,ETH,32.925,1.9,Low income


In [7]:
# Reverse the dataframe:

data[ : : -1]

Unnamed: 0,Country,CountryCode,BirthRate,InternetUsers,IncomeGroup
194,Zimbabwe,ZWE,35.715,18.5,Low income
193,Zambia,ZMB,40.471,15.4,Lower middle income
192,"Congo, Dem. Rep.",COD,42.394,2.2,Low income
191,South Africa,ZAF,20.850,46.5,Upper middle income
190,"Yemen, Rep.",YEM,32.947,20.0,Lower middle income
...,...,...,...,...,...
4,United Arab Emirates,ARE,11.044,88.0,High income
3,Albania,ALB,12.877,57.2,Upper middle income
2,Angola,AGO,45.985,19.1,Upper middle income
1,Afghanistan,AFG,35.253,5.9,Low income


## Skipping Rows

###  Skipping Rows During Import

In [8]:
# Importing with skipping first 5 row (0th index):

skipped_df = pd.read_csv('../00_Datasets/Demographic-Data.csv', skiprows=5)
skipped_df

Unnamed: 0,United Arab Emirates,ARE,11.044,88,High income
0,Argentina,ARG,17.716,59.9000,High income
1,Armenia,ARM,13.308,41.9000,Lower middle income
2,Antigua and Barbuda,ATG,16.447,63.4000,High income
3,Australia,AUS,13.200,83.0000,High income
4,Austria,AUT,9.400,80.6188,High income
...,...,...,...,...,...
185,"Yemen, Rep.",YEM,32.947,20.0000,Lower middle income
186,South Africa,ZAF,20.850,46.5000,Upper middle income
187,"Congo, Dem. Rep.",COD,42.394,2.2000,Low income
188,Zambia,ZMB,40.471,15.4000,Lower middle income


## Index Manipulation

### Importing with Specific Index

In [9]:
# Importing by making "CountryCode" as index:

specific_df = pd.read_csv('../00_Datasets/Demographic-Data.csv', index_col='CountryCode')
specific_df

Unnamed: 0_level_0,Country,BirthRate,InternetUsers,IncomeGroup
CountryCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ABW,Aruba,10.244,78.9,High income
AFG,Afghanistan,35.253,5.9,Low income
AGO,Angola,45.985,19.1,Upper middle income
ALB,Albania,12.877,57.2,Upper middle income
ARE,United Arab Emirates,11.044,88.0,High income
...,...,...,...,...
YEM,"Yemen, Rep.",32.947,20.0,Lower middle income
ZAF,South Africa,20.850,46.5,Upper middle income
COD,"Congo, Dem. Rep.",42.394,2.2,Low income
ZMB,Zambia,40.471,15.4,Lower middle income


In [10]:
# Importing with 1st column as index , i.e., avoiding default index:

without_df = pd.read_csv('../00_Datasets/Demographic-Data.csv', index_col=0)
without_df

Unnamed: 0_level_0,CountryCode,BirthRate,InternetUsers,IncomeGroup
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aruba,ABW,10.244,78.9,High income
Afghanistan,AFG,35.253,5.9,Low income
Angola,AGO,45.985,19.1,Upper middle income
Albania,ALB,12.877,57.2,Upper middle income
United Arab Emirates,ARE,11.044,88.0,High income
...,...,...,...,...
"Yemen, Rep.",YEM,32.947,20.0,Lower middle income
South Africa,ZAF,20.850,46.5,Upper middle income
"Congo, Dem. Rep.",COD,42.394,2.2,Low income
Zambia,ZMB,40.471,15.4,Lower middle income


## Sub-Setting the DataFrame

In [11]:
# Selecting BirthRate and InternetUsers of 5-10 indexed rows:

cols = ['BirthRate', 'InternetUsers']

data[cols][5:11]

Unnamed: 0,BirthRate,InternetUsers
5,17.716,59.9
6,13.308,41.9
7,16.447,63.4
8,13.2,83.0
9,9.4,80.6188
10,18.3,58.7


## Selecting data with `loc` and `iloc`

`loc` and `iloc` indexers can be used to get rows from the dataframe.

- `iloc` : allows to access the rows by integer location. To sub-set the dataframe with `iloc`; we have to provide integer index for columns(with 1st column indexed as `0`); as well.

- `loc` : allows to access the rows by labels. Labels are index column labels and by default they are integers starting from zero.

### The `iloc` Indexer

In [12]:
# To get details of first row:

data.iloc[0]

Country                Aruba
CountryCode              ABW
BirthRate             10.244
InternetUsers           78.9
IncomeGroup      High income
Name: 0, dtype: object

In [16]:
# Last row of dataframe:

data.iloc[-1]

Country            Zimbabwe
CountryCode             ZWE
BirthRate            35.715
InternetUsers          18.5
IncomeGroup      Low income
Name: 194, dtype: object

In [15]:
# First column of the dataframe:

data.iloc[:,0]

0                     Aruba
1               Afghanistan
2                    Angola
3                   Albania
4      United Arab Emirates
               ...         
190             Yemen, Rep.
191            South Africa
192        Congo, Dem. Rep.
193                  Zambia
194                Zimbabwe
Name: Country, Length: 195, dtype: object

In [17]:
# last column of the dataframe:

data.iloc[:,-1]

0              High income
1               Low income
2      Upper middle income
3      Upper middle income
4              High income
              ...         
190    Lower middle income
191    Upper middle income
192             Low income
193    Lower middle income
194             Low income
Name: IncomeGroup, Length: 195, dtype: object

In [20]:
# To get specific rows:

data.iloc[[0,1,5]]

Unnamed: 0,Country,CountryCode,BirthRate,InternetUsers,IncomeGroup
0,Aruba,ABW,10.244,78.9,High income
1,Afghanistan,AFG,35.253,5.9,Low income
5,Argentina,ARG,17.716,59.9,High income


In [25]:
# To get specific rows for a single column:

data.iloc[[0,1],2]

0    10.244
1    35.253
Name: BirthRate, dtype: float64

In [21]:
# To get multiple continuous rows:

data.iloc[0:5]

Unnamed: 0,Country,CountryCode,BirthRate,InternetUsers,IncomeGroup
0,Aruba,ABW,10.244,78.9,High income
1,Afghanistan,AFG,35.253,5.9,Low income
2,Angola,AGO,45.985,19.1,Upper middle income
3,Albania,ALB,12.877,57.2,Upper middle income
4,United Arab Emirates,ARE,11.044,88.0,High income


In [22]:
# To get multiple continuous columns

data.iloc[:,0:3]

Unnamed: 0,Country,CountryCode,BirthRate
0,Aruba,ABW,10.244
1,Afghanistan,AFG,35.253
2,Angola,AGO,45.985
3,Albania,ALB,12.877
4,United Arab Emirates,ARE,11.044
...,...,...,...
190,"Yemen, Rep.",YEM,32.947
191,South Africa,ZAF,20.850
192,"Congo, Dem. Rep.",COD,42.394
193,Zambia,ZMB,40.471


In [23]:
# To get specific rows and columns (discontinuous):

data.iloc[[1,8,12], [0,2,4]]

Unnamed: 0,Country,BirthRate,IncomeGroup
1,Afghanistan,35.253,Low income
8,Australia,13.2,High income
12,Belgium,11.2,High income


In [24]:
# To get continuous rows and columns:

data.iloc[1:5, 0:3]

Unnamed: 0,Country,CountryCode,BirthRate
1,Afghanistan,AFG,35.253
2,Angola,AGO,45.985
3,Albania,ALB,12.877
4,United Arab Emirates,ARE,11.044


### The `loc` Indexer

In [9]:
# Accessing rows by loc:

data.loc[0]

Country                Aruba
CountryCode              ABW
BirthRate             10.244
InternetUsers           78.9
IncomeGroup      High income
Name: 0, dtype: object

In [13]:
# Accessing multiple rows:

data.loc[[0,1]]

Unnamed: 0,Country,CountryCode,BirthRate,InternetUsers,IncomeGroup
0,Aruba,ABW,10.244,78.9,High income
1,Afghanistan,AFG,35.253,5.9,Low income


In [15]:
# To sub-set the data (first 2 rows of BirthRate):

data.loc[[0,1], "BirthRate"]

0    10.244
1    35.253
Name: BirthRate, dtype: float64

In [18]:
# Multi-row sub-setting:

data.loc[[0,1], ["BirthRate", "InternetUsers"]]

Unnamed: 0,BirthRate,InternetUsers
0,10.244,78.9
1,35.253,5.9


In [17]:
# Getting data with specific index:

specific_df.loc['AFG']

Country          Afghanistan
BirthRate             35.253
InternetUsers            5.9
IncomeGroup       Low income
Name: AFG, dtype: object

## Selecting data with `at` and `iat`

To access a scalar value (i.e. a single element in the dataframe), the fastest way is to use the `at` and `iat` methods.

- `at` provides label-based scalar lookups.
  We need to provide the row index number and column name to get the element.

- `iat` provides integer-based lookups.
  We need to provide the index values of row and column to get the element.

### The `at` Method

In [26]:
# Getting the "BirthRate" column and 4th row:

data.at[4, "BirthRate"]

11.044

### The `iat` Method

In [30]:
# Getting the 4th row and 3rd column ("BirthRate") data:

data.iat[4,2]

11.044

The major difference betweet `loc` & `iloc` and `at` & `iat` are as follows:  

- `at` and `iat` are meant to access a scalar, that is, a single element in the dataframe.
- `loc` and `iloc` are ments to access several elements at the same time, potentially to perform vectorized operations.