# Pandas

Pandas or "Panel Data" is the core library to handle structured data in Python. Structured data is just data in a tabular format (like in Excel). 

The core of Pandas is the DataFrame. A DataFrame is basically a table and is composed of Series. A Series is a set of values that are indexed. Try the code below.

In [1]:
import pandas as pd

series_test = [1, 2, 3, 6, 7]
print(pd.Series(series_test))

0    1
1    2
2    3
3    6
4    7
dtype: int64


### DataFrames

Although we've almost entirely created our own data throughout this process, you'll usually work with data that already exists. This is where the `read_***` method of Pandas comes in. This allows you to read in data from any number of data sources including:

- CSVs
- Excel
- Stata
- SAS
- SPSS
- SQL
- Big Query
- ORC
- and much more!

For this tutorial we'll be sticking mostly to reading CSV's because there usually are special setups you need to get right to read data from SQL databases. 

A CSV stands for "comma-separated values" and is a text format that separates each record with a comma and each line with a newline ("enter" key). It is part of a family of formats that include tab-separated values where the commas are separated by tabs, and pipe-separated values where the commas are separated by pipes (|). The commas, tabs, and pipes are what we call "delimiters".

In [3]:
import os

pwd = os.getcwd()

pwd

'/Users/dickinsd/Github/python-for-data-analysts'

import os # This will be used to tell us where the file is
import pandas as pd

pwd = os.getcwd() # This creates a string of the folder this Python Script is stored in

filepath = pwd + "/simple_csv.csv" # This creates a string that is the filepath to the simple_csv file

first_import = pd.read_csv(filepath) # This reads the csv into Python
first_import

In [5]:
filepath = pwd + "/simple_csv.csv"

first_import = pd.read_csv(filepath)

first_import

Unnamed: 0,Column1,Column2,Column3
0,0.280925,0.910368,0.692982
1,0.719882,0.210024,0.761276
2,0.235752,0.059796,0.154667
3,0.603366,0.485614,0.013345
4,0.039985,0.236359,0.811832
5,0.015478,0.543641,0.112719
6,0.19254,0.409874,0.875082
7,0.380921,0.045468,0.515859
8,0.592471,0.786421,0.047249
9,0.072112,0.695694,0.409573


You'll see here, if you want to read a specific sheet from an Excel file then you'll need to use the argument sheet_name

> excel_import = pd.read_excel(filepath, sheet_name="Sheet1")

In [8]:
first_import["Column1"]

0     0.280925
1     0.719882
2     0.235752
3     0.603366
4     0.039985
5     0.015478
6     0.192540
7     0.380921
8     0.592471
9     0.072112
10    0.898072
11    0.722038
12    0.513517
13    0.436729
14    0.864039
15    0.724761
16    0.198926
17    0.748500
18    0.920639
19    0.452024
20    0.542977
21    0.355266
22    0.142688
23    0.850436
24    0.443476
25    0.467100
26    0.338536
27    0.942101
28    0.570130
Name: Column1, dtype: float64

first_import.Column1
first_import["Column1"] # You can use either of these techniques to specify a column in a table. I prefer the second one because it allows your column names to have spaces in them

In [10]:
sum(first_import.Column1) # just wanted to know if you can sum against a column

14.265388315