# Viewing and inspecting data with pandas

## Libraries

In [1]:
import pandas as pd
from openpyxl.workbook import workbook

### Manipulating the data frame is key to getting what you want out of the data
* Some basic selection
* viewing functions: ````read_csv````,````columns````,````iloc````,````to_excel````
* Saving desire values to an Excel sheet
* work woth the CSV file named names_dataset.csv

## Problem Statement

* Imagine  you had to deal with a spreedsheet with so many columns that it was hard to fully read the data through your terminal.

* You need to know which columns contain what so that you can access the data you need to proceed.  In order to do this all we need to do is use the same function we used to assign the columns. 


In [6]:
df_csv = pd.read_csv('files/names_dataset.csv', header=None)
df_csv

Unnamed: 0,0,1,2,3,4
0,Skippie,Conboy,sconboy0@pcworld.com,Male,$80626.92
1,Shell,Kunz,skunz1@theatlantic.com,Female,$81887.16
2,Mel,Jencey,mjencey2@lycos.com,Male,$16066.46
3,Monte,Kendrew,mkendrew3@unblog.fr,Male,$35525.57
4,Jacky,Grout,jgrout4@businesswire.com,Male,$62111.25
...,...,...,...,...,...
95,Gib,Daine,gdaine2n@a8.net,Male,$67297.03
96,Cam,Tethacot,ctethacot2o@springer.com,Female,$5077.27
97,Kalil,Cruikshank,kcruikshank2p@mtv.com,Male,$84463.67
98,Freddi,Paudin,fpaudin2q@aboutads.info,Female,$77028.00


### 1. How to add headers to a data frame

* As you can see, there're not columns defined in data frame
* Using command ````columns````, you can add column's names to data frame

In [7]:
df_csv.columns = ['first_name','last_name', 'email','gender','Income']
df_csv

Unnamed: 0,first_name,last_name,email,gender,Income
0,Skippie,Conboy,sconboy0@pcworld.com,Male,$80626.92
1,Shell,Kunz,skunz1@theatlantic.com,Female,$81887.16
2,Mel,Jencey,mjencey2@lycos.com,Male,$16066.46
3,Monte,Kendrew,mkendrew3@unblog.fr,Male,$35525.57
4,Jacky,Grout,jgrout4@businesswire.com,Male,$62111.25
...,...,...,...,...,...
95,Gib,Daine,gdaine2n@a8.net,Male,$67297.03
96,Cam,Tethacot,ctethacot2o@springer.com,Female,$5077.27
97,Kalil,Cruikshank,kcruikshank2p@mtv.com,Male,$84463.67
98,Freddi,Paudin,fpaudin2q@aboutads.info,Female,$77028.00


* You must length match to avoid errors: ````ValueError: Length mismatch: Expected axis has 7 elements, new values have 6 elements````

### 2. How to view the values of a column

* Now, let's say you want to view just one column
* You can index by column name, in this case you want to view the data of the State column

In [8]:
# example using single []
df_csv['email']

0         sconboy0@pcworld.com
1       skunz1@theatlantic.com
2           mjencey2@lycos.com
3          mkendrew3@unblog.fr
4     jgrout4@businesswire.com
                ...           
95             gdaine2n@a8.net
96    ctethacot2o@springer.com
97       kcruikshank2p@mtv.com
98     fpaudin2q@aboutads.info
99     mgeorgescu2r@smh.com.au
Name: email, Length: 100, dtype: object

In [9]:
#Example using double []
df_csv[['email']]

Unnamed: 0,email
0,sconboy0@pcworld.com
1,skunz1@theatlantic.com
2,mjencey2@lycos.com
3,mkendrew3@unblog.fr
4,jgrout4@businesswire.com
...,...
95,gdaine2n@a8.net
96,ctethacot2o@springer.com
97,kcruikshank2p@mtv.com
98,fpaudin2q@aboutads.info


* You can view the values and their indices

### 3. How to view multiple column's data

* If you want to access multiple column's data you just pass it in as a list
* Se usan los doble ````[[]]````,```` [['email,'gender']]```` para representar el index del data frame como una lista de columnas

In [10]:
df_csv[['email','gender']]

Unnamed: 0,email,gender
0,sconboy0@pcworld.com,Male
1,skunz1@theatlantic.com,Female
2,mjencey2@lycos.com,Male
3,mkendrew3@unblog.fr,Male
4,jgrout4@businesswire.com,Male
...,...,...
95,gdaine2n@a8.net,Male
96,ctethacot2o@springer.com,Female
97,kcruikshank2p@mtv.com,Male
98,fpaudin2q@aboutads.info,Female


### 4. How to use slicing to view values of the 3 first lines of a single column

* Now if we had a large set of data and we can view only certain value we can achieve this by slicing
* By slicing you can choice the column number and the row number as a coordinate system to view value
* In this case we want to view the value of the row 3
* In this case we want to view the 3 first lines of the column ````First````, it is indexed from 0 to 2

In [11]:
print(df_csv['first_name'][0:3])

0    Skippie
1      Shell
2        Mel
Name: first_name, dtype: object


### 5. How to view values from a single row
* Use the Integer Location function -> iloc()
* By slicing you can choice the row number you can access
* In this case you can access to value of row number 4
* the correct syntax is: ````dataframe.iloc[4]````, data frame  + iloc function + number of index into the ````[]````

In [12]:
# single []
df_csv.iloc[4]

first_name                       Jacky
last_name                        Grout
email         jgrout4@businesswire.com
gender                            Male
Income                       $62111.25
Name: 4, dtype: object

In [13]:
# double []
df_csv.iloc[[4]]

Unnamed: 0,first_name,last_name,email,gender,Income
4,Jacky,Grout,jgrout4@businesswire.com,Male,$62111.25


In [14]:
#df_csv.iloc[df_csv.index]
#df_csv.index

In [15]:
print(df_csv.iloc[4])

first_name                       Jacky
last_name                        Grout
email         jgrout4@businesswire.com
gender                            Male
Income                       $62111.25
Name: 4, dtype: object


### 6. How to view values from a single cell
* By slicing chose the row number and column number
* Remember to start by 0
* You can access for the value on row 97, column 1
* From data frame chose the value "Cruikshank"

In [17]:
df_csv.tail(10)

Unnamed: 0,first_name,last_name,email,gender,Income
90,Sherline,Wittman,swittman2i@51.la,Female,$59602.62
91,Joceline,Strotone,jstrotone2j@tinyurl.com,Female,$63364.45
92,Winonah,Fulbrook,wfulbrook2k@accuweather.com,Female,$17882.15
93,Eleanor,Bremen,ebremen2l@amazon.de,Female,$41609.06
94,Tuckie,Urch,turch2m@163.com,Male,$26008.44
95,Gib,Daine,gdaine2n@a8.net,Male,$67297.03
96,Cam,Tethacot,ctethacot2o@springer.com,Female,$5077.27
97,Kalil,Cruikshank,kcruikshank2p@mtv.com,Male,$84463.67
98,Freddi,Paudin,fpaudin2q@aboutads.info,Female,$77028.00
99,Merci,Georgescu,mgeorgescu2r@smh.com.au,Female,$45843.61


* Value "Cruikshank" is located on column 2 and row 97
* Use slicing combined with iloc()function

In [32]:
#df_csv.iloc[df_csv.index].isnull() #to view what cell has null values

In [16]:
print(df_csv.iloc[97,1])

Cruikshank


### 7. How to select values, store and save them as a new data frame in an Excel file

* Imagine you can select a set of values from te data frame, for example the values of the columns: *first_name*,*last_name* and *email*
* You can store the values in a new excel file

In [18]:
# 1. Select the wanted values
df_csv[['first_name','last_name','email']]

Unnamed: 0,first_name,last_name,email
0,Skippie,Conboy,sconboy0@pcworld.com
1,Shell,Kunz,skunz1@theatlantic.com
2,Mel,Jencey,mjencey2@lycos.com
3,Monte,Kendrew,mkendrew3@unblog.fr
4,Jacky,Grout,jgrout4@businesswire.com
...,...,...,...
95,Gib,Daine,gdaine2n@a8.net
96,Cam,Tethacot,ctethacot2o@springer.com
97,Kalil,Cruikshank,kcruikshank2p@mtv.com
98,Freddi,Paudin,fpaudin2q@aboutads.info


In [19]:
# 2. Store the wanted value sin a variable
wanted_values = df_csv[['first_name','last_name','email']]

In [20]:
# 3. Save the wanted values in an excel file
stored = wanted_values.to_excel('files/Data_location.xlsx', index=None)

## Reference
[Vieweing and inspecting data](https://www.linkedin.com/learning/using-python-with-excel/viewing-and-inspecting-data-with-pandas?autoplay=true&resume=false&u=2134922)