### Reading CSV Files in Python

Ah, the good old CSV format. A CSV (or Comma Separated Value) file is the most common type of file that a data scientist will ever work with. These files use a “,” as a delimiter to separate the values and each row in a CSV file is a data record.

These are useful to transfer data from one application to another and is probably the reason why they are so commonplace in the world of data science.

If you look at them in the Notepad, you will notice that the values are separated by commas:

### Few File access mode:
- ‘w’ – writing to a file
- ‘r+’ or ‘w+’ – read and write to a file
- ‘a’ – appending to an already existing file
- ‘a+’ – append to a file after reading

- The Pandas library makes it very easy to read CSV files using the read_csv() function:

In [None]:
import pandas as pd


In [None]:
NAME    b   c
1   b1    c1


In [None]:
df.to_csv(filepath,index=False)

In [None]:
# import pandas
import pandas as pd

# read csv file into a DataFrame
df.to_csv(r'C:\a\t\csv.csv')
# display DataFrame
df.head()

Unnamed: 0.1,Unnamed: 0,a
0,0,2020-02-01


### Reading TSV File
- The Pandas library makes it very easy to read TSV files using the read_csv() function, like ‘\t’ or ‘;’ etc.
- These can also be imported with the read_csv() function by specifying the delimiter in the parameter value as shown below while reading a ***TSV (Tab Separated Values)*** file:

In [None]:
import pandas as pd

df = pd.read_csv(r'tsv.txt',delimiter='\t')
df

Unnamed: 0,UniqueID,disbursed_amount,asset_cost,ltv,branch_id,supplier_id,manufacturer_id,Current_pincode_ID,Date.of.Birth,Employment.Type,...,SEC.SANCTIONED.AMOUNT,SEC.DISBURSED.AMOUNT,PRIMARY.INSTAL.AMT,SEC.INSTAL.AMT,NEW.ACCTS.IN.LAST.SIX.MONTHS,DELINQUENT.ACCTS.IN.LAST.SIX.MONTHS,AVERAGE.ACCT.AGE,CREDIT.HISTORY.LENGTH,NO.OF_INQUIRIES,loan_default
0,420825,50578,58400,89.55,67,22807,45,1441,01-01-84,Salaried,...,0,0,0,0,0,0,0yrs 0mon,0yrs 0mon,0,0
1,537409,47145,65550,73.23,67,22807,45,1502,31-07-85,Self employed,...,0,0,1991,0,0,1,1yrs 11mon,1yrs 11mon,0,1
2,417566,53278,61360,89.63,67,22807,45,1497,24-08-85,Self employed,...,0,0,0,0,0,0,0yrs 0mon,0yrs 0mon,0,0


### Reading Excel Files in Python
- Most of you will be quite familiar with Excel files and why they are so widely used to store tabular data. So I’m going to jump right to the code and import an Excel file in Python using Pandas.

- Pandas has a very handy function called read_excel() to read Excel files:

In [None]:
# read Excel file into a DataFrame
df = df.to_excel(`excel.xlsx',index=)
# print values
df.head()

Unnamed: 0,UniqueID,disbursed_amount,asset_cost,ltv,branch_id,supplier_id,manufacturer_id,Current_pincode_ID,Date.of.Birth,Employment.Type,...,SEC.SANCTIONED.AMOUNT,SEC.DISBURSED.AMOUNT,PRIMARY.INSTAL.AMT,SEC.INSTAL.AMT,NEW.ACCTS.IN.LAST.SIX.MONTHS,DELINQUENT.ACCTS.IN.LAST.SIX.MONTHS,AVERAGE.ACCT.AGE,CREDIT.HISTORY.LENGTH,NO.OF_INQUIRIES,loan_default
0,420825,50578,58400,89.55,67,22807,45,1441,1984-01-01,Salaried,...,0,0,0,0,0,0,0yrs 0mon,0yrs 0mon,0,0
1,537409,47145,65550,73.23,67,22807,45,1502,1985-07-31,Self employed,...,0,0,1991,0,0,1,1yrs 11mon,1yrs 11mon,0,1
2,417566,53278,61360,89.63,67,22807,45,1497,1985-08-24,Self employed,...,0,0,0,0,0,0,0yrs 0mon,0yrs 0mon,0,0
3,624493,57513,66113,88.48,67,22807,45,1501,1993-12-30,Self employed,...,0,0,31,0,0,0,0yrs 8mon,1yrs 3mon,1,1
4,539055,52378,60300,88.39,67,22807,45,1495,1977-12-09,Self employed,...,0,0,0,0,0,0,0yrs 0mon,0yrs 0mon,1,1


### Reading Data as Pickle Files in Python
- Pickle files are used to store the serialized form of Python objects. This means objects like list, set, tuple, dict, etc. are converted to a character stream before being stored on the disk. This allows you to continue working with the objects later on. These are particularly useful when you have trained your machine learning model and want to save them to make predictions later on.

- So, if you serialized the files before saving them, you need to de-serialize them before you use them in your Python programs. This is done using the ***pickle.load()*** function in the pickle module.

In [None]:
df = pd.DataFrame({"a":['2020-02-01']})
df.to_csv('abc.csv')
df['a'] = pd.to_datetime(df['a'])
df['a'].dt.year

0    2020
Name: a, dtype: int64

In [None]:
df = pd.read_csv('abc.csv')
df['a']#.dt.year

0    2020-02-01
Name: a, dtype: object

In [None]:
l1 = [df1,df2]

In [None]:
import pickle

with open('pickle.pkl','rb') as file:
    data = pickle.load(file)



df_pkl = pd.DataFrame(data)
df_pkl.head()

Unnamed: 0,UniqueID,disbursed_amount,asset_cost,ltv,branch_id,supplier_id,manufacturer_id,Current_pincode_ID,Date.of.Birth,Employment.Type,...,SEC.SANCTIONED.AMOUNT,SEC.DISBURSED.AMOUNT,PRIMARY.INSTAL.AMT,SEC.INSTAL.AMT,NEW.ACCTS.IN.LAST.SIX.MONTHS,DELINQUENT.ACCTS.IN.LAST.SIX.MONTHS,AVERAGE.ACCT.AGE,CREDIT.HISTORY.LENGTH,NO.OF_INQUIRIES,loan_default
0,420825,50578,58400,89.55,67,22807,45,1441,1984-01-01,Salaried,...,0,0,0,0,0,0,0yrs 0mon,0yrs 0mon,0,0
1,537409,47145,65550,73.23,67,22807,45,1502,1985-07-31,Self employed,...,0,0,1991,0,0,1,1yrs 11mon,1yrs 11mon,0,1
2,417566,53278,61360,89.63,67,22807,45,1497,1985-08-24,Self employed,...,0,0,0,0,0,0,0yrs 0mon,0yrs 0mon,0,0
3,624493,57513,66113,88.48,67,22807,45,1501,1993-12-30,Self employed,...,0,0,31,0,0,0,0yrs 8mon,1yrs 3mon,1,1
4,539055,52378,60300,88.39,67,22807,45,1495,1977-12-09,Self employed,...,0,0,0,0,0,0,0yrs 0mon,0yrs 0mon,1,1


### OR

In [None]:
import pickle


df = pd.read_pickle('pickle.pkl')
df.head()

Unnamed: 0,UniqueID,disbursed_amount,asset_cost,ltv,branch_id,supplier_id,manufacturer_id,Current_pincode_ID,Date.of.Birth,Employment.Type,...,SEC.SANCTIONED.AMOUNT,SEC.DISBURSED.AMOUNT,PRIMARY.INSTAL.AMT,SEC.INSTAL.AMT,NEW.ACCTS.IN.LAST.SIX.MONTHS,DELINQUENT.ACCTS.IN.LAST.SIX.MONTHS,AVERAGE.ACCT.AGE,CREDIT.HISTORY.LENGTH,NO.OF_INQUIRIES,loan_default
0,420825,50578,58400,89.55,67,22807,45,1441,1984-01-01,Salaried,...,0,0,0,0,0,0,0yrs 0mon,0yrs 0mon,0,0
1,537409,47145,65550,73.23,67,22807,45,1502,1985-07-31,Self employed,...,0,0,1991,0,0,1,1yrs 11mon,1yrs 11mon,0,1
2,417566,53278,61360,89.63,67,22807,45,1497,1985-08-24,Self employed,...,0,0,0,0,0,0,0yrs 0mon,0yrs 0mon,0,0
3,624493,57513,66113,88.48,67,22807,45,1501,1993-12-30,Self employed,...,0,0,31,0,0,0,0yrs 8mon,1yrs 3mon,1,1
4,539055,52378,60300,88.39,67,22807,45,1495,1977-12-09,Self employed,...,0,0,0,0,0,0,0yrs 0mon,0yrs 0mon,1,1


There are still more file types that you can read to, so this list is not exhaustive. 
- For More Details on Reading Files click on [this](https://www.youtube.com/watch?v=AdfdY4zGiEk)

**Write a CSV File**
You can save your Pandas DataFrame as a CSV file with .to_csv():

- df.to_csv('data.csv')

That’s it! You’ve created the file data.csv in your current working directory. You can expand the code block below to see how your CSV file should look:

This text file contains the data separated with commas. The first column contains the row labels. In some cases, you’ll find them irrelevant.

If you don’t want to keep them, then you can pass the argument index=False to .to_csv().



**Write an Excel File**
Once you have those packages installed, you can save your DataFrame in an Excel file with .to_excel():

- df.to_excel('data.xlsx')

The argument 'data.xlsx' represents the target file and, optionally, its path. The above statement should create the file data.xlsx in your current working directory.



**Write Files**
Series and DataFrame objects have methods that enable writing data and labels to the clipboard or files. They’re named with the pattern .to_<file-type>(), where <file-type> is the type of the target file.

You’ve learned about .to_csv() and .to_excel(), but there are others, including:

- .to_json()
- .to_html()
- .to_sql()
- .to_pickle()


### To export a dataframe safely to .csv file format

In [None]:
dataframe.to_csv('filename.csv',index=False)

### To save any Python dataframe as a pickle (.pkl) file

In [None]:
with open('filename.pkl', 'wb') as f:
    pickle.dump(dataframe, f)

### To save any Python dataframe as a excel (.xlsx) file

In [None]:
dataframe.to_excel("filename.xlsx",index=False) 

### To save any Python dataframe as a tab separated file (.tsv) file

In [None]:
import csv
dataframe.to_csv('filename.tsv',sep='\t')

There are still more file types that you can write to, so this list is not exhaustive. 
- For More Details on writing File Click on [this](https://www.youtube.com/watch?v=lRerVytOQLU)