# Importing Data
One of the most useful applications of pandas is its ability to read in different file types for easy data analysis using pandas and Python. For our purposes, we'll focus on reading .csv and .xlsx files in pandas, but pandas can read in .tsv, .dat, and basically any other file type you can come across. 

### The read_csv() function

In [1]:
import pandas as pd
help(pd.read_csv)

Help on function read_csv in module pandas.io.parsers:

read_csv(filepath_or_buffer: Union[str, pathlib.Path, IO[~AnyStr]], sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal: str = '.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
    Read a comma-separated values (csv) file in

The read_csv() function does exactly what it sounds like - it reads a .csv file and turns it into a pandas dataframe! As you can see above, there are a lot of different arguments that go into the read_csv() function - we're going to cover a few of the most important ones. First, we'll show an example of how to use pd.read_csv().

In [2]:
df = pd.read_csv("ctr_by_state.csv")

In [3]:
df

Unnamed: 0,Row Labels,State,Sum of Impressions,Sum of Clicks,Sum of CTR
0,AK,Alaska,19512,63,0.00323
1,AL,Alabama,119196,376,0.00315
2,AR,Arkansas,55970,157,0.00281
3,AZ,Arizona,185120,527,0.00285
4,CA,California,1089852,3250,0.00298
5,CO,Colorado,227018,610,0.00269
6,CT,Connecticut,95720,291,0.00304
7,DC,district of columbia,113124,324,0.00286
8,DE,Delaware,25196,68,0.0027
9,FL,Florida,781268,2005,0.00257


In [4]:
df.head()

Unnamed: 0,Row Labels,State,Sum of Impressions,Sum of Clicks,Sum of CTR
0,AK,Alaska,19512,63,0.00323
1,AL,Alabama,119196,376,0.00315
2,AR,Arkansas,55970,157,0.00281
3,AZ,Arizona,185120,527,0.00285
4,CA,California,1089852,3250,0.00298


In [5]:
#showing index_col
df = pd.read_csv("ctr_by_state.csv", index_col="Row Labels")

In [6]:
df.head()

Unnamed: 0_level_0,State,Sum of Impressions,Sum of Clicks,Sum of CTR
Row Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AK,Alaska,19512,63,0.00323
AL,Alabama,119196,376,0.00315
AR,Arkansas,55970,157,0.00281
AZ,Arizona,185120,527,0.00285
CA,California,1089852,3250,0.00298


The "index_col" argument lets you specify one of the columns from your .csv file to serve as the index, rather than Pandas automatically generating a numeric index for you. 

In [7]:
#showing the skiprows argument
df = pd.read_csv("ctr_by_state.csv", skiprows=1)
df.head()

Unnamed: 0,AK,Alaska,19512,63,0.00323
0,AL,Alabama,119196,376,0.00315
1,AR,Arkansas,55970,157,0.00281
2,AZ,Arizona,185120,527,0.00285
3,CA,California,1089852,3250,0.00298
4,CO,Colorado,227018,610,0.00269


The skiprows parameter allows us to specify a number of rows in our file to skip before we set up the DataFrame. Here, obviously we wouldn't want to skip any rows since the first row is the headers, but in other cases, where there is a lot of leading whitespace in your file, it is very useful.

### The read_excel() function.
Pandas also has functionality to work with excel files, using the read_excel() function. Let's take a look at it.

In [None]:
help(pd.read_excel)

In [8]:
df = pd.read_excel("ctr_by_state.xlsx")

In [9]:
df.shape

(51, 5)

If your excel spreadsheet has multiple sheets, you need to specify which sheet to import. The default is the first sheet. 

In [10]:
df = pd.read_excel("ctr_by_state.xlsx", sheet_name = "Sheet1")

In [11]:
df.shape

(15, 5)

In [12]:
df.head()

Unnamed: 0,Row Labels,State,Sum of Impressions,Sum of Clicks,Sum of CTR
0,AK,Alaska,19512,63,0.00323
1,AL,Alabama,119196,376,0.00315
2,AR,Arkansas,55970,157,0.00281
3,AZ,Arizona,185120,527,0.00285
4,CA,California,1089852,3250,0.00298
