# PANDAS READ CSV IN PYTHON

CSV stands for Comma-Separated Values.
It is a simple text-based file format used to store tabular data (like a spreadsheet or database table) in which each line represents a row and each value within that row is separated by a comma.

In [2]:
# snippet
import pandas as pd

data = pd.read_csv('people_data.csv')
print(data)

           Name   Age            City  Occupation
0      John Doe    28        New York    Engineer
1   Alice Smith    34     Los Angeles    Designer
2   Bob Johnson    22         Chicago   Developer
3    Emma Brown    45           Miami     Manager
4  David Wilson    39         Seattle      Doctor
5  Sophie Davis    26   San Francisco     Teacher


In [None]:
Here, we extract data from a file called "people_data.csv".

We can also save dataframe to a file.

In [3]:
# snippet
import pandas as pd

data = {
    'Name': ['John Doe', 'Alice Smith', 'Bob Johnson', 'Emma Brown', 'David Wilson', 'Sophie Davis'],
    'Age': [28, 34, 22, 45, 39, 26],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Miami', 'Seattle', 'San Francisco'],
    'Occupation': ['Engineer', 'Designer', 'Developer', 'Manager', 'Doctor', 'Teacher']
}

# create dataframe
df = pd.DataFrame(data)

# save to csv
df.to_csv('people_data1.csv', index=False)

print("SAVED TO CSV")

SAVED TO CSV


## read_csv() syntax

In [None]:
import pandas as pd

# Syntax
df = pd.read_csv('file_path.csv', sep=',', header='infer', index_col=None, usecols=None, dtype=None, engine='python')

### Parameters:
**'file_path.csv'**: Path to your CSV file (can be a local file path or a URL).

**sep**: Specifies the delimiter (default is , for CSV files). You can change it if your file uses another delimiter (e.g., \t for tab-separated values).

**header**: Row number(s) to use as the column names (default is infer, which means pandas will try to automatically find the header row).

**index_col**: Column(s) to set as index (default is None, meaning no index).

**usecols**: List of columns to read (default is None, meaning all columns).

**dtype**: Dictionary of column types to force (default is None).

**engine**: Parsing engine ('python' or 'c', default is 'c').

## so, let's talk about some features

1. Read specific columns using read_csv

We use the **usecols** parameter

In [4]:
df = pd.read_csv('people_data1.csv', usecols=['Name', 'Age'])
print(df)

           Name  Age
0      John Doe   28
1   Alice Smith   34
2   Bob Johnson   22
3    Emma Brown   45
4  David Wilson   39
5  Sophie Davis   26


2. Setting an Index Column (index_col)

The index_col parameter sets one or more columns as the DataFrame index, making the specified column(s) act as row labels for easier data referencing.

In [6]:
df = pd.read_csv('people_data1.csv', index_col='Name')
print(df)

              Age           City Occupation
Name                                       
John Doe       28       New York   Engineer
Alice Smith    34    Los Angeles   Designer
Bob Johnson    22        Chicago  Developer
Emma Brown     45          Miami    Manager
David Wilson   39        Seattle     Doctor
Sophie Davis   26  San Francisco    Teacher
