# Pandas Series and Dataframes

Pandas is a Python package to wrangle and analyze tabular data. It is built on top of NumPy and has become the core tool for doing data analysis in Python.

In [1]:
# load packages
import pandas as pd
import numpy as np

## Series

A series is a one-dimensional array of indexed data.

In [2]:
# A numpy array
arr = np.random.randn(4) # random values from std normal distribution
print(type(arr))
print(arr, "\n")

# A pandas series made from the previous array
s = pd.Series(arr)
print(type(s))
print(s)

<class 'numpy.ndarray'>
[ 0.0963139  -0.672408    1.38737327 -0.78509906] 

<class 'pandas.core.series.Series'>
0    0.096314
1   -0.672408
2    1.387373
3   -0.785099
dtype: float64


### Creating a pandas.Series

The basic method to create a `pandas.Series` is to call.

s = pd.Series(data, index=index)

The data parameter can be:

- a list or NumPy array,
- a Python dictionary, or
- a single number, boolean (True/False), or string.

*index parameter is optional*

In [3]:
# A series from a numpy array 
pd.Series(np.arange(3), index=[2023, 2024, 2025])

2023    0
2024    1
2025    2
dtype: int64

In [4]:
# A series from a list of strings with default index
pd.Series(['EDS 220', 'EDS 222', 'EDS 223', 'EDS 242'])

0    EDS 220
1    EDS 222
2    EDS 223
3    EDS 242
dtype: object

In [5]:
# Construct dictionary
d = {'key_0':2, 'key_1':'3', 'key_2':5}

# Initialize series using a dictionary
pd.Series(d)

key_0    2
key_1    3
key_2    5
dtype: object

In [6]:
# A series from a single value
pd.Series(3.0, index = ['A', 'B', 'C'])

A    3.0
B    3.0
C    3.0
dtype: float64

#### Simple operations

Arithmetic operations work on series and so most NumPy functions. For example:

In [7]:
# Define a series
s = pd.Series([98,73,65],index=['Andrea', 'Beth', 'Carolina'])

# Divide each element in series by 10
print(s /10, '\n')

# Take the exponential of each element in series
print(np.exp(s), '\n')

# Original series is unchanged
print(s)

Andrea      9.8
Beth        7.3
Carolina    6.5
dtype: float64 

Andrea      3.637971e+42
Beth        5.052394e+31
Carolina    1.694889e+28
dtype: float64 

Andrea      98
Beth        73
Carolina    65
dtype: int64


In [8]:
# pandas.Series with True/False values indicating whether the elements in a series satisfy a condition or not
s > 70

Andrea       True
Beth         True
Carolina    False
dtype: bool

#### Identifying missing values

In [9]:
# Series with NAs in it
s = pd.Series([1, 2, np.nan, 4, np.nan]) #  We can represent a missing, NULL, or NA value with the float value numpy.nan
s

0    1.0
1    2.0
2    NaN
3    4.0
4    NaN
dtype: float64

In [10]:
# Check if series has NAs
s.hasnans

True

In [11]:
# If an element in the row at the given index is np.nan (True = is NA) or not (False = not NA)
s.isna()

0    False
1    False
2     True
3    False
4     True
dtype: bool

#### Check-in

In [12]:
# Create a pandas.Series named s with four integer values, two of which are -999.
# The index of this series should be the the letters A through D.
s = pd.Series([2, -999, 17, -999], index = ["A", "B", "C", "D"])
s

A      2
B   -999
C     17
D   -999
dtype: int64

In [13]:
# Use mask() to update the series s so that the -999 values are replaced by NA values.
s.where(s < 0)
s.mask(s < 0)

A     2.0
B     NaN
C    17.0
D     NaN
dtype: float64

## Dataframes

It represents tabular data and we can think of it as a spreadhseet.

### Creating a pandas.DataFrame

There are many ways of creating a pandas.DataFrame. We present one simple one in this section.

In [14]:
# Initialize dictionary with columns' data 
d = {'col_name_1' : pd.Series(np.arange(3)),
     'col_name_2' : pd.Series([3.1, 3.2, 3.3]),
     }

# Create data frame
df = pd.DataFrame(d)
df

Unnamed: 0,col_name_1,col_name_2
0,0,3.1
1,1,3.2
2,2,3.3


In [15]:
# Change index
df.index = ['a','b','c']
df

Unnamed: 0,col_name_1,col_name_2
a,0,3.1
b,1,3.2
c,2,3.3


#### Check-in

In [16]:
# Update the column names to C1 and C2 by updating this attribute.
df.columns = ["C1", "C2"]
df

Unnamed: 0,C1,C2
a,0,3.1
b,1,3.2
c,2,3.3


#### Summary

Chapter 1 is an introduction to the pandas library for data analysis. This lesson focuses on two core objects: pandas.Series and pandas.DataFrame. It explains how to create a series, apply arithmetic operations, and identify NAs. It also explains how to make a data frame, change the index, and rename columns.