# Read and Write Excel CSV
Pandas is a powerful tool for reading and writing data in various formats including Excel and CSV. In this module, we will explore how to read and write Excel and CSV files using Pandas.

In [1]:
import pandas as pd

## 01. Read CSV File Using read_csv() Method
To read a CSV file using Pandas, you can use the read_csv() function. This function takes the filename as an argument and returns a Pandas DataFrame object.

In [2]:
filepath = "D:\Coding\Git Repository\Data-Science-Bootcamp-with-Python\Datasets\stock_data.csv"
df = pd.read_csv(filepath)
df

Unnamed: 0,tickets,eps,revenue,price,people
0,GOOGL,27.82,87,845,larry page
1,WMT,4.61,484,65,n.a.
2,MSFT,-1,85,64,bill gates
3,RIL,not available,50,1023,mukesh ambani
4,TATA,5.6,-1,n.a.,ratan tata


## 02. Skip Rows in DataFrame Using skiprows() Method
In Pandas, you can use the skiprows() method to skip rows in a DataFrame while reading a CSV or Excel file. This can be useful when you have header rows, comment lines, or other non-data rows that you want to exclude from the DataFrame.

In [3]:
df1 = pd.read_csv(filepath, skiprows=1)
df1

Unnamed: 0,GOOGL,27.82,87,845,larry page
0,WMT,4.61,484,65,n.a.
1,MSFT,-1,85,64,bill gates
2,RIL,not available,50,1023,mukesh ambani
3,TATA,5.6,-1,n.a.,ratan tata


## 03. Import Data from CSV with "null header"
Sometimes you may encounter CSV files that do not have a header row or have a header row with blank or null values. In Pandas, you can still import such CSV files and specify column names later using the header parameter in the read_csv() function.

In [4]:
df2 = pd.read_csv(filepath, skiprows=1, header=None)
df2

Unnamed: 0,0,1,2,3,4
0,GOOGL,27.82,87,845,larry page
1,WMT,4.61,484,65,n.a.
2,MSFT,-1,85,64,bill gates
3,RIL,not available,50,1023,mukesh ambani
4,TATA,5.6,-1,n.a.,ratan tata


In [5]:
df3 = pd.read_csv(filepath, skiprows=1, header=None, names=["tickets", "eps", "revenue", "price", "people"])
df3

Unnamed: 0,tickets,eps,revenue,price,people
0,GOOGL,27.82,87,845,larry page
1,WMT,4.61,484,65,n.a.
2,MSFT,-1,85,64,bill gates
3,RIL,not available,50,1023,mukesh ambani
4,TATA,5.6,-1,n.a.,ratan tata


## 04. Reading Limited Data from CSV
In Pandas, you can read a limited number of rows from a CSV file using the nrows parameter in the read_csv() function.

In [6]:
df4 = pd.read_csv(filepath, nrows=4)
df4

Unnamed: 0,tickets,eps,revenue,price,people
0,GOOGL,27.82,87,845,larry page
1,WMT,4.61,484,65,n.a.
2,MSFT,-1,85,64,bill gates
3,RIL,not available,50,1023,mukesh ambani


## 05. Clean Up Messy Data from CSV using na_values() Method
When working with CSV files, you may encounter missing or null values that can make your data messy and difficult to work with. In Pandas, you can use the na_values() method to clean up messy data by specifying which values should be treated as null values.

In [7]:
df5 = pd.read_csv(filepath, na_values=["not available", "n.a."])
df5

Unnamed: 0,tickets,eps,revenue,price,people
0,GOOGL,27.82,87,845.0,larry page
1,WMT,4.61,484,65.0,
2,MSFT,-1.0,85,64.0,bill gates
3,RIL,,50,1023.0,mukesh ambani
4,TATA,5.6,-1,,ratan tata


In addition to using a list to specify which values should be treated as null values when reading a CSV file in Pandas, you can also use a dictionary to map specific null values to specific columns. This can be helpful when you need to treat different columns differently based on the null values they contain.

In [8]:
df6 = pd.read_csv(filepath, na_values={
    "revenue": [-1, "n.a.", "not applicable"],
    "eps": ["n.a.", 'not available'],
    "people": ["n.a."],
    "price": ["n.a.", "not available"]
})
df6

Unnamed: 0,tickets,eps,revenue,price,people
0,GOOGL,27.82,87.0,845.0,larry page
1,WMT,4.61,484.0,65.0,
2,MSFT,-1.0,85.0,64.0,bill gates
3,RIL,,50.0,1023.0,mukesh ambani
4,TATA,5.6,,,ratan tata


## 06. Write DataFrame into CSV Using to_csv() Method
Once you have cleaned up and processed your data in a Pandas DataFrame, you may want to save it to a CSV file for further analysis or sharing with others. You can easily do this using the to_csv() method in Pandas.

In [9]:
df6.to_csv("new_stock_data.csv")

### 01. index Parameter
In the above code, we use the to_csv() method to write the DataFrame to a CSV file called 'new_stock_data.csv'. We can pass index=False to exclude the DataFrame index from being written to the file.

In [10]:
df6.to_csv("new_stock_data.csv", index=False)

### 02. columns Parameter
The to_csv() method in Pandas allows you to customize the output of your DataFrame to a CSV file. One of the options you can specify is the columns parameter, which allows you to write only specific columns from your DataFrame to the CSV file.

In [11]:
df6.columns

Index(['tickets', 'eps', 'revenue', 'price', 'people'], dtype='object')

In [12]:
df6.to_csv("new_stock_data.csv", index=False, columns=["tickets", "eps"])

### 03. header Parameter
The header parameter allows you to include or exclude the column names as the first row in the CSV file.

In [13]:
df6.to_csv("new_stock_data.csv", index=False, header=False)

## 07. Read Excel File Using read_excel() Method
To read an excel file using Pandas, you can use the read_excel() function. This function takes the filename as an argument and returns a Pandas DataFrame object.

In [14]:
filepath = "D:\Coding\Git Repository\Data-Science-Bootcamp-with-Python\Datasets\stock_data.xlsx"
df7 = pd.read_excel(filepath, sheet_name="stock_data")
df7

Unnamed: 0,tickets,eps,revenue,price,people
0,GOOGL,27.82,87,845,larry page
1,WMT,4.61,484,65,n.a.
2,MSFT,-1,85,64,bill gates
3,RIL,not available,50,1023,mukesh ambani
4,TATA,5.6,-1,n.a.,ratan tata


## 08. Converters Argument in read_excel()
The read_excel() function in Pandas allows you to read data from an Excel file into a Pandas DataFrame. One of the arguments you can use to customize the import process is the converters parameter.

The converters parameter is used to specify a dictionary of functions that should be applied to specific columns during the import process. The keys of the dictionary represent the column names or indices, and the values are the functions to apply to the corresponding columns.

In this example, we define a custom function 'convert_people_cell()' that converts any 'n.a.' input to a string which is 'bill gates'. We then read an Excel file called data.xlsx using the read_excel() function and pass a dictionary to the converters parameter. The dictionary has one key-value pair, where the key is the name of the column to apply the function to (people), and the value is the function to apply (convert_people_cell).

In [15]:
def convert_people_cell(cell):
    if cell == "n.a.":
        return "jeff bezos"
    else:
        return cell

In [16]:
df8 = pd.read_excel(filepath, converters={
    "people": convert_people_cell
})
df8

Unnamed: 0,tickets,eps,revenue,price,people
0,GOOGL,27.82,87,845,larry page
1,WMT,4.61,484,65,jeff bezos
2,MSFT,-1,85,64,bill gates
3,RIL,not available,50,1023,mukesh ambani
4,TATA,5.6,-1,n.a.,ratan tata


In [17]:
def convert_eps_cell(cell):
    if cell == "not available":
        return None
    else:
        return cell

In [18]:
df9 = pd.read_excel(filepath, converters={
    "eps": convert_eps_cell,
    "people": convert_people_cell
})
df9

Unnamed: 0,tickets,eps,revenue,price,people
0,GOOGL,27.82,87,845,larry page
1,WMT,4.61,484,65,jeff bezos
2,MSFT,-1.0,85,64,bill gates
3,RIL,,50,1023,mukesh ambani
4,TATA,5.6,-1,n.a.,ratan tata


## 09. Write DataFrame into 'excel' File using to_excel() Method
To write a Pandas DataFrame to an Excel file, you can use the to_excel() method.

In [19]:
df9.to_excel("new_stocks.xlsx", sheet_name="Stocks")

### 01. index Parameter
The to_excel() method in Pandas allows you to write a DataFrame to an Excel file with various options to customize the output. One of these options is the index parameter, which controls whether or not to include the DataFrame's index in the Excel file.

In [20]:
df9.to_excel("new_stocks.xlsx", sheet_name="Stocks", index=False)

### 02. startrow and startcol Parameter
The startrow and startcol parameters in the to_excel() method of Pandas allow you to specify the starting row and column for writing data to an Excel file.

In [21]:
df9.to_excel("new_stocks.xlsx", sheet_name="Stocks", index=False, startrow=1, startcol=1)

## 10. Use ExcelWritter() Class
The ExcelWriter class in Pandas is a powerful tool for writing data frames to one or more sheets in an Excel file. This class provides a lot of flexibility and options for formatting the output, such as specifying the sheet name, adding headers and footers, setting column widths and row heights, and so on.

In [22]:
# Creating two separate dataframe
df_stocs = pd.DataFrame({
    "tickets": ["GOOGLE", "WMT", "MSFT"],
    "price": [845, 65, 64],
    "Pe": [30.37, 14.26, 30.97],
    "eps": [27.82, 4.61, 2.12]
})

df_weather = pd.DataFrame({
    "day": ["1/1/2020", "1/2/2020", "1/3/2020"],
    "temperature": [32, 35, 28],
    "event": ["Rain", "Sunny", "Snow"]
})

In [23]:
with pd.ExcelWriter("stocks_and_weather.xlsx") as writer:
    df_stocs.to_excel(writer, sheet_name="Stock", index=False)
    df_weather.to_excel(writer, sheet_name="Weather", index=False)