## Pandas --Check

In pandas every change we made in the data has to be checked. For this there are different functions. Let's look briefly into them.

After reading in the data, it is always good to check that everything went well. For checking any output we can use the print() function. The challenge is that large data files might not nicely print on screen using the print() function.

In [5]:
import pandas as pd
fp = r'C:\Users\Gokul G\Desktop\WORK\GISISH\GIS-ish\data\Kanpur.csv'
data = pd.read_csv(fp,sep=',',index_col= "Month", skiprows=6, na_values=[-999.0,-999])

In [11]:
print(data)

          AOD_1640nm  AOD_1020nm  AOD_870nm  AOD_865nm  AOD_779nm  AOD_675nm  \
Month                                                                          
2001-JAN         NaN    0.176252   0.224173        NaN        NaN   0.313830   
2001-FEB         NaN    0.207033   0.249738        NaN        NaN   0.327059   
2001-MAR         NaN    0.222493   0.249700        NaN        NaN   0.294628   
2001-APR         NaN    0.317698   0.338976        NaN        NaN   0.373772   
2001-MAY         NaN    0.671964   0.702189        NaN        NaN   0.752185   
...              ...         ...        ...        ...        ...        ...   
2022-AUG         NaN         NaN        NaN        NaN        NaN        NaN   
2022-SEP         NaN         NaN        NaN        NaN        NaN        NaN   
2022-OCT         NaN         NaN        NaN        NaN        NaN        NaN   
2022-NOV         NaN         NaN        NaN        NaN        NaN        NaN   
2022-DEC         NaN         NaN        

To display the data more neatly just calling the data is enough. But this still includes the entire data.

In [7]:
data

Unnamed: 0_level_0,AOD_1640nm,AOD_1020nm,AOD_870nm,AOD_865nm,AOD_779nm,AOD_675nm,AOD_667nm,AOD_620nm,AOD_560nm,AOD_555nm,...,NUM_POINTS[440-870_Angstrom_Exponent],NUM_POINTS[380-500_Angstrom_Exponent],NUM_POINTS[440-675_Angstrom_Exponent],NUM_POINTS[500-870_Angstrom_Exponent],NUM_POINTS[340-440_Angstrom_Exponent],NUM_POINTS[440-675_Angstrom_Exponent[Polar]],Data_Quality_Level,Latitude(degrees),Longitude(degrees),Elevation(meters)
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2001-JAN,,0.176252,0.224173,,,0.313830,,,,,...,174,171,172,174,169,0,lev20,26.512778,80.231639,123
2001-FEB,,0.207033,0.249738,,,0.327059,,,,,...,988,986,987,988,976,0,lev20,26.512778,80.231639,123
2001-MAR,,0.222493,0.249700,,,0.294628,,,,,...,1169,1169,1169,1169,1165,0,lev20,26.512778,80.231639,123
2001-APR,,0.317698,0.338976,,,0.373772,,,,,...,1207,1207,1207,1207,1200,0,lev20,26.512778,80.231639,123
2001-MAY,,0.671964,0.702189,,,0.752185,,,,,...,691,684,691,691,633,0,lev20,26.512778,80.231639,123
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-AUG,,,,,,,,,,,...,0,0,0,0,0,0,lev20,26.512778,80.231639,123
2022-SEP,,,,,,,,,,,...,0,0,0,0,0,0,lev20,26.512778,80.231639,123
2022-OCT,,,,,,,,,,,...,0,0,0,0,0,0,lev20,26.512778,80.231639,123
2022-NOV,,,,,,,,,,,...,0,0,0,0,0,0,lev20,26.512778,80.231639,123


- ## head()

Compared to previous cases it might be better to look at only the top 5–10 lines of the file rather than loading the entire thing. 

In this we can use data.head() to quickly check the contents of the dataframe. This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

In [12]:
data.head()

Unnamed: 0_level_0,AOD_1640nm,AOD_1020nm,AOD_870nm,AOD_865nm,AOD_779nm,AOD_675nm,AOD_667nm,AOD_620nm,AOD_560nm,AOD_555nm,...,NUM_POINTS[440-870_Angstrom_Exponent],NUM_POINTS[380-500_Angstrom_Exponent],NUM_POINTS[440-675_Angstrom_Exponent],NUM_POINTS[500-870_Angstrom_Exponent],NUM_POINTS[340-440_Angstrom_Exponent],NUM_POINTS[440-675_Angstrom_Exponent[Polar]],Data_Quality_Level,Latitude(degrees),Longitude(degrees),Elevation(meters)
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2001-JAN,,0.176252,0.224173,,,0.31383,,,,,...,174,171,172,174,169,0,lev20,26.512778,80.231639,123
2001-FEB,,0.207033,0.249738,,,0.327059,,,,,...,988,986,987,988,976,0,lev20,26.512778,80.231639,123
2001-MAR,,0.222493,0.2497,,,0.294628,,,,,...,1169,1169,1169,1169,1165,0,lev20,26.512778,80.231639,123
2001-APR,,0.317698,0.338976,,,0.373772,,,,,...,1207,1207,1207,1207,1200,0,lev20,26.512778,80.231639,123
2001-MAY,,0.671964,0.702189,,,0.752185,,,,,...,691,684,691,691,633,0,lev20,26.512778,80.231639,123


- ## tail()

We can also check the last rows of the data using data.tail(). It is useful for quickly verifying data, for example, after sorting or appending rows. If n is larger than the number of rows, this function returns all rows.

In [13]:
data.tail()

Unnamed: 0_level_0,AOD_1640nm,AOD_1020nm,AOD_870nm,AOD_865nm,AOD_779nm,AOD_675nm,AOD_667nm,AOD_620nm,AOD_560nm,AOD_555nm,...,NUM_POINTS[440-870_Angstrom_Exponent],NUM_POINTS[380-500_Angstrom_Exponent],NUM_POINTS[440-675_Angstrom_Exponent],NUM_POINTS[500-870_Angstrom_Exponent],NUM_POINTS[340-440_Angstrom_Exponent],NUM_POINTS[440-675_Angstrom_Exponent[Polar]],Data_Quality_Level,Latitude(degrees),Longitude(degrees),Elevation(meters)
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-AUG,,,,,,,,,,,...,0,0,0,0,0,0,lev20,26.512778,80.231639,123
2022-SEP,,,,,,,,,,,...,0,0,0,0,0,0,lev20,26.512778,80.231639,123
2022-OCT,,,,,,,,,,,...,0,0,0,0,0,0,lev20,26.512778,80.231639,123
2022-NOV,,,,,,,,,,,...,0,0,0,0,0,0,lev20,26.512778,80.231639,123
2022-DEC,,,,,,,,,,,...,0,0,0,0,0,0,lev20,26.512778,80.231639,123
