# CSV Files and Pandas

In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis. Pandas DataFrame is a tabular data structure with labelled rows and columns (similar to Excel files). Pandas DataFrame consists of three principal components, the data, rows, and columns.

In [None]:
# Need to import the pandas library

import pandas as pd

## CSV to DataFrame

In [None]:
# read_csv() method opens and reads a csv file to a DataFrame

df = pd.read_csv('nba.csv')

In [None]:
df

In [None]:
# .shape is a property (not a method)
# It returns the shape of the DataFrame as
# (num of rows, num of columns)

df.shape

In [None]:
# Designating an index column

df = pd.read_csv('nba.csv', index_col = "Name")

In [None]:
df

In [None]:
# Statistical power of Pandas!

df.describe()

### Missing values

Data scientists spend a lot of time processing data. One of the main issues is to handle missing values. Pandas has several built in methods for this purpose.

In [None]:
# Which values are missing?

df.isna()

In [None]:
# Drop the rows with missing values. 
# This doesn't change the original DataFrame. 

df.dropna()

In [None]:
df

In [None]:
# Replace NaN with a value

df.fillna(0)

In [None]:
# Using iloc (index location) to take a snapshot of the dataframe.

df.iloc[0:10, 0:4]

In [None]:
# Using iloc (index location) to take a snapshot of the dataframe.

df.iloc[[0, 2, 4, 6, 8], [3, 4, 5]]

## DataFrame to CSV

### Creating a DataFrame

In [None]:
# Creating DataFrame from list of lists
# Each list becomes a row of the dataFrame

data = [[1,2,3],[4, 5, 6]]
df = pd.DataFrame(data)
df

In [None]:
# Creating DataFrame from a single list
# Each element in the list becomes a row of the dataFrame

data = [1,2,3]
df = pd.DataFrame(data)
df

In [None]:
# We can provide row and column labels in a list

data = [[1,2,3],[4, 5, 6]]
df = pd.DataFrame(data, ["row1", "row2"], ["col1", "col2", "col3"])
df

In [None]:
# If a method has several parameters, it is a good idea to name them 

data = [[1,2,3],[4, 5, 6]]
df = pd.DataFrame(data, index=["row1", "row2"], columns=["col1", "col2", "col3"])
df

In [None]:
# Returns the shape of a dataFrame as (num of rows, num of columns)

df.shape

### Writing and Appending DataFrame to CSV file

In [None]:
# Writing a dataFrame to a CSV file

data_file = open("data.csv", 'w+')

df.to_csv(data_file)
data_file.close()

In [None]:
# Appending to a CSV file 

df2 = pd.DataFrame([[7, 8, 9]], index=["row3"])

data_file = open("data.csv", "a")
data_file.write("\n")
df2.to_csv(data_file, mode="a", header=False)
data_file.close()