#### 1. Saving DataFrame to a CSV file

- Pandas dataframes are used to store and manipulate two-dimensional tabular data in python.
<br>
- After having performed our pre-processing or analysis with our data,<br>
  we may want to save it as a separate CSV (Comma Separated Values) file for future use or reference. 
<br>

- The pandas **to_csv()** function is used to save a dataframe as a CSV file.

###### to_csv() function

- It is a pandas dataframe function used to save a dataframe as a CSV file. 

  **syntax:-** df.to_csv(path)

   - The above syntax, by default saves the index of the dataframe as a separate column.

   - If we do not want to include the index, pass index=False to the above function.

In [1]:
import pandas as pd

data = {
    'Name': ['Microsoft Corporation', 'Google, LLC', 'Tesla, Inc.',\
             'Apple Inc.', 'Netflix, Inc.'],
    'Symbol': ['MSFT', 'GOOG', 'TSLA', 'AAPL', 'NFLX'],
    'Shares': [100, 50, 150, 200, 80]
}

df = pd.DataFrame(data)

df

# A dataframe with name, stock symbol, and the respective shares count of companies in a sample portfolio:

Unnamed: 0,Name,Symbol,Shares
0,Microsoft Corporation,MSFT,100
1,"Google, LLC",GOOG,50
2,"Tesla, Inc.",TSLA,150
3,Apple Inc.,AAPL,200
4,"Netflix, Inc.",NFLX,80


**Example 1: to_csv() with default parameters**

In [16]:
df.to_csv("shares.csv")

- This is how the saved CSV file looks if we open it up in Excel:

![image.png](attachment:image.png)

- We can see in the above snapshot that using the to_csv() function<br>
  with default parameters saves it along with an additional column for index.

**Example 2: to_csv() with index=False**

In [13]:
# Generally, we may not want to include the index of the dataframe as a separate column. 
# Particularly when they’re just continuous numbers providing no additional information. 

# For this, we can pass the parameter index=False to the to_csv() function.

df.to_csv("shares-1.csv", index=False)

- This is how the saved CSV file looks if we open it up in Excel:

![image.png](attachment:image.png)

- We can see in the above snapshot that the saved CSV<br>
  does not have an additional column for index.

**Example 3: to_csv() with header=False**

In [14]:
# If we do not want to include column names in our saved CSV file pass header=False to the to_csv() function.

df.to_csv("shares-2.csv", index=False, header=False)

- This is how the saved CSV file looks if we open it up in Excel:

![image.png](attachment:image.png)

- Since we passed **header=False** the saved CSV file doesn’t have the column headers.<br>
  we can also pass a custom list of column names to the header argument,<br>
  if we want columns to have different names. 
<br>

- Note that, we also passed index=False.

**Example 4: to_csv() with a subset of columns**

In [15]:
# The to_csv() function also allows you the flexibility to choose the columns 
# We want from the dataframe to be saved to the CSV file. 

# We can pass the column names you want to include as a list to the columns argument.

df.to_csv("shares-3.csv", index=False, columns=['Symbol', 'Shares'])

- This is how the saved CSV file looks if we open it up Excel.

![image.png](attachment:image.png)

- In the above example, we passed the list of columns to be included in the CSV file
  as a list to the columns argument of the to_csv() function. 
<br>
- We can see that only the columns passed: Symbol and Shares are present in the saved CSV file. 
<br>

- Note that, we also passed index=False.

#### 2. Saving DataFrame to an Excel file

- Excel files can be a great way of saving our tabular data particularly 
  when we want to display it (and even perform some formatting to it) in a nice GUI like Microsoft Excel.
<br>

**to_excel() function**

- The pandas DataFrame to_excel() function is used to save a pandas dataframe to an excel file. 
  It’s like the to_csv() function but instead of a CSV, it writes the dataframe to a .xlsx file. 
<br>

**Syntax:-** df.to_excel("path\file_name.xlsx")
    
- Here, df is a pandas dataframe and is written to the excel file file_name.xlsx present at the location path. 
<br>
- By default, the dataframe is written to Sheet1 but you can also give custom sheet names.
<br>
- We can also write to multiple sheets in the same excel workbook as well (See the examples below).
<br>

- Note that once the excel workbook is saved, we cannot write further data without rewriting the whole workbook.

In [2]:
import pandas as pd

data = {
    'Name': ['Microsoft Corporation', 'Google, LLC', 'Tesla, Inc.',\
             'Apple Inc.', 'Netflix, Inc.'],
    'Symbol': ['MSFT', 'GOOG', 'TSLA', 'AAPL', 'NFLX'],
    'Shares': [100, 50, 150, 200, 80]
}

df = pd.DataFrame(data)

df

Unnamed: 0,Name,Symbol,Shares
0,Microsoft Corporation,MSFT,100
1,"Google, LLC",GOOG,50
2,"Tesla, Inc.",TSLA,150
3,Apple Inc.,AAPL,200
4,"Netflix, Inc.",NFLX,80


In [20]:
pip install openpyxl  # To work on excel, we need to install 'openpyxl' package. 

Collecting openpyxlNote: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 23.2.1
[notice] To update, run: python.exe -m pip install --upgrade pip



  Downloading openpyxl-3.1.2-py2.py3-none-any.whl (249 kB)
     -------------------------------------- 250.0/250.0 kB 2.6 MB/s eta 0:00:00
Collecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.1.2


**Example 1:- Saving dataframe to an excel file with default parameters**

In [21]:
df.to_excel("portfolio.xlsx")

**Example 2:- Save dataframe to an excel file with custom sheet name**

In [3]:
# We can specify the name of the worksheet using the sheet_name parameter.

# with custom sheet name

df.to_excel("portfolio-sheetname.xlsx", sheet_name="stocks")

#### 3. Read data from CSV files 

- The pandas read_csv() function is used to read a CSV file into a dataframe.
<br>
- It comes with a number of different parameters to customize how we’d like to read the file. 

  **Syntax for loading a csv file to a dataframe:-** df = pd.read_csv(path_to_file)
  <br>

   - Here, path_to_file is the path to the CSV file you want to load.
     <br>
   - It can be any valid string path or a URL.
     <br>
   - It returns a pandas dataframe. 

**Example 1:- Read CSV from its location on our machine**

In [2]:
# To read a CSV file locally stored on our machine pass,
# the path to the file to the read_csv() function. 
# We can pass a relative path, that is, the path with respect to our current working directory 
# or we can pass an absolute path.

# read csv using relative path

import pandas as pd

df = pd.read_csv('Iris.csv')

df.head()

# In the above example, the CSV file Iris.csv is loaded from its location using a relative path. 
# Here, the file is present in the current working directory.

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [5]:
# We can also read a CSV file from its absolute path. 

# read csv using absolute path

import pandas as pd


df = pd.read_csv(r"C:\Users\HP\DATA SCIENCES-NEW\2-ADVANCED PYTHON\2-Pandas\Iris.csv")

df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


**Example 2:- Read CSV from a URL**

In [6]:
# We can also read a CSV file from its URL. 
# Pass the URL to the read_csv() function and it’ll read the corresponding file to a dataframe. 
# The Iris dataset can also be downloaded from the UCI Machine Learning Repository. 
# Let’s use their dataset download URL to read it as a dataframe.

import pandas as pd

df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data")

df.head()

Unnamed: 0,5.1,3.5,1.4,0.2,Iris-setosa
0,4.9,3.0,1.4,0.2,Iris-setosa
1,4.7,3.2,1.3,0.2,Iris-setosa
2,4.6,3.1,1.5,0.2,Iris-setosa
3,5.0,3.6,1.4,0.2,Iris-setosa
4,5.4,3.9,1.7,0.4,Iris-setosa


- We can see that the read_csv() function is able to read a dataset from its URL. 

- It is interesting to note that in this particular data source, we do not have headers. 

- The read_csv() function infers the header by default and here uses the first row of the dataset as the header.

**Example 3:- Read a CSV file without a header**

In [7]:
# In the above example, we saw that if the dataset does not have a header, 
# the read_csv() function infers it by itself and uses the first row of the dataset as the header.
# We can change this behavior through the header parameter,
# pass None if your dataset does not have a header.
# We can also pass a custom list of integers as a header.

import pandas as pd

df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", header=None)

df.head()

# In the above example, we pass header=None to the read_csv() function since the dataset did not have a header.

Unnamed: 0,0,1,2,3,4
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


**Example 4:- Read a CSV file and give custom column names**

In [8]:
# We can give custom column names to your dataframe when reading a CSV file using the read_csv() function.
# Pass your custom column names as a list to the names parameter.

import pandas as pd

df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data",
                 names = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'])

df.head()

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


**Example 5:- Read CSV with a column as index**

In [9]:
# We can also use a column as the row labels of the dataframe.

# Pass the column name to the index_col parameter. 

# read csv with a column as index

import pandas as pd

df = pd.read_csv('Iris.csv', index_col='Id')

df.head()

# In the above example, we can see that the Id column is used as the row index of the dataframe df. 

# We can also pass multiple columns as list to the index_col parameter to be used as row index.

Unnamed: 0_level_0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,5.1,3.5,1.4,0.2,Iris-setosa
2,4.9,3.0,1.4,0.2,Iris-setosa
3,4.7,3.2,1.3,0.2,Iris-setosa
4,4.6,3.1,1.5,0.2,Iris-setosa
5,5.0,3.6,1.4,0.2,Iris-setosa


**Example 6:- Read only a subset of columns of a CSV**

In [10]:
# We can also specify the subset of columns to read from the dataset.
# Pass the subset of columns you want as a list to the usecols parameter. 

# For example, let’s read all the columns from Iris.csv except Id.

# read csv with a column as index

import pandas as pd

df = pd.read_csv('Iris.csv', usecols=['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'])

df.head()

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


**Example 7:- Read only the first n rows of a CSV**

In [11]:
# We can also specify the number of rows of a file to read using the nrows parameter to the read_csv() function. 
# Particularly useful when you want to read a small segment of a large file.

# read csv with a column as index

import pandas as pd

df = pd.read_csv('Iris.csv', nrows=3)

df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa


#### 4.Get Row Count

In [None]:
The size of the dataframe is a very important factor to determine
the kind of manipulations and processes that can be applied to it. 

For example, if we have limited resources and working with large datasets, 
it is important to use processes that are not compute-heavy.

There are a number of ways to get the number of rows of a pandas dataframe. 
You can determine it using the shape of the dataframe. Or, 
you can use the len() function.

In [4]:
import pandas as pd

df = pd.read_csv('weatherAUS.csv')

df.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,W,...,71.0,22.0,1007.7,1007.1,8.0,,16.9,21.8,No,No
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,NNW,...,44.0,25.0,1010.6,1007.8,,,17.2,24.3,No,No
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,W,...,38.0,30.0,1007.6,1008.7,,2.0,21.0,23.2,No,No
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,SE,...,45.0,16.0,1017.6,1012.8,,,18.1,26.5,No,No
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,ENE,...,82.0,33.0,1010.8,1006.0,7.0,8.0,17.8,29.7,No,No


###### Get row count using .shape[0]

In [7]:
# The .shape property gives us the shape of the dataframe in form of a (row_count, column_count) tuple. 
# That is, the first element of the tuple gives you the row count of the dataframe. 

# Let’s get the shape of the above dataframe:

# number of rows using .shape[0]

print("Shape of the dataframe:-", df.shape)

print()

print("No of rows in the dataframe:-", df.shape[0])

# We can see that df.shape gives the tuple (145460, 23) denoting that 
# the dataframe df has 145460 rows and 23 columns. 

# If we specifically want just the number of rows, use df.shape[0]

Shape of the dataframe:- (145460, 23)

No of rows in the dataframe:- 145460


###### Get row count using the len() function

In [8]:
# We can also use the built-in python len() function to determine the number of rows. 

# This function is used to get the length of iterable objects. 

print("Number of rows using len() function:-", len(df))

Number of rows using len() function:- 145460


#### 5. Get Value of a Cell in Dataframe

![image.png](attachment:image.png)

###### 1. iat property

- Use the Pandas dataframe **iat** property to get the cell value of a dataframe<br>
  using its row and column indices (integer positions). 
<br>

- Alternatively, if we want to access the value of a cell using its row and column labels, use the **at** property.

  **Syntax:-**
  <br>
  
  - get cell value using row and column indices (integer positions):- **df.iat[row_position, column_position]**
  <br>

  - get cell value using row and column labels:- **df.at[row_label, column_label]**
<br>
- It returns the single cell value for the given row/column pair.
<br>

- Note that, we use the **iat** or the **at** property to specifically access a single value
  for the given row and column indices (or labels). 
<br>

- We can also use the iloc and the loc property to access the value of a single (and multiple cells) in a Pandas dataframe.

In [9]:
import pandas as pd

# employee data
data = {
    "Name": ["Jim", "Dwight", "Angela", "Tobi"],
    "Age": [26, 28, 27, 32],
    "Department": ["Sales", "Sales", "Accounting", "HR"]
}

df = pd.DataFrame(data)

df

# Here, we created a dataframe containing information about some employees in an office. 
# The dataframe has 4 rows and 3 columns (“Name”, “Age”, and “Department”).

Unnamed: 0,Name,Age,Department
0,Jim,26,Sales
1,Dwight,28,Sales
2,Angela,27,Accounting
3,Tobi,32,HR


**Example 1 – Access dataframe cell value using **iat** property**

In [10]:
# Let’s get the department of the employee “Dwight”.

# To access a cell value using the iat property, we need to provide its row and column indices. 

# Note that rows and columns in a Pandas dataframe are indexed starting from 0 by default.

# get cell value using row and column indices (integer position)

df.iat[1, 2]

'Sales'

**Example 2 – Access the dataframe cell value using the at property**

In [11]:
# Using the row and column labels with the at property this time.

# The column label is the column name itself and since the above dataframe does not have explicitly defined row labels,
# we will use its row indices as the row label.

# get cell value using row and column labels

df.at[1, "Department"]

# We get the “Department” value in the row 1 (which represents the department of the employee “Dwight”).

'Sales'

###### 2. Using iloc and loc to access a cell value in Pandas dataframe

- The iloc and loc properties of a Pandas dataframe are used to access<br>
  a group of rows and columns,but we can also use them to access the value for a single cell.      

In [12]:
# get cell value using row and column indices (integer position)

df.iloc[1, 2]

# Here, we get the value of the cell represented by row index 1 and column index 2 using the iloc property.

'Sales'

In [13]:
# get cell value using row and column labels

df.loc[1, "Department"]

# Here, we get the value of the cell represented by the row label 1 
# and the column label “Department” using the loc property.

'Sales'