# Pandas

Python is well suited to handle basic data manipulation tasks, Pandas however offers a higher-level interface specifically designed for data management and analysis. It provides a more efficient, intuitive, and expressive way to work with tabular data. It is one of the most popular and widely used Python libraries for data science.

https://pandas.pydata.org/docs/user_guide/10min.html

In [None]:
# Installed as part of Anaconda (if done according to class)
# If not; install using "pip install pandas"

# Typical to import pandas with alias "pd"
import pandas as pd

## Series and DataFrame

* A *Series* in Pandas is a one-dimensional array holding data of any type. 
* A *DataFrame* in Pandas is a 2 dimensional array, or a table with rows and columns, where each column can be of any type.

Data sets in Pandas are usually multi-dimensional tables (DataFrames). A Series is like a column, a DataFrame is the whole table.

In [None]:
# from pandas import Series, DataFrame

my_series = pd.Series(["one", "two", "three", "four", "five"])
print(my_series)

In [None]:
my_series.values

In [None]:
my_series.index

In [None]:
# Add explicit indexes
my_series = pd.Series(["one", "two", "three", "four", "five"], ["a", "b", "c", "d", "e"])
print(my_series)
print(f"Value of index b: {my_series['b']}")

In [None]:
# Assignments (data manipulation)

my_series["b"] += "_more"
print(my_series)

In [None]:
# Get subset of data (filtering)

my_series = my_series[["a", "c", "d", "e"]]
print(my_series)

In [None]:
# Filter on conditions (filtering)

my_series = my_series[my_series != "three"]
print(my_series)

In [None]:
# Manipulate all data

my_series =  my_series * 2
print(my_series)

my_lambda = lambda s : s * 2
my_series = my_lambda(my_series)
print(my_series)


In [None]:
# Check for existence (key)

print("a" in my_series)
print("x" in my_series)

In [None]:
# Simplify it a bit, use a dict

cities = {'Oslo':634293, 'Bergen':271949, 'Kristiansand':85983}
city_series = pd.Series(cities)
print(city_series)

In [None]:
# Restrict what you want

cities = {'Oslo':634293, 'Bergen':271949, 'Kristiansand':85983}
city_series = pd.Series(cities, ["Oslo", "Bergen"])
print(city_series)

In [None]:
# Properly name it all
cities = {'Oslo':634293, 'Bergen':271949, 'Kristiansand':85983}
city_series = pd.Series(cities, name='Population')
# city_series.name = "Population"
city_series.index.name = "City"
print(city_series)