# Pandas Series and Data Frames

In [4]:
# import packages
import pandas as pd
import numpy as np

Series: a one-dimensional array of indexed data

A `pandas.Series` having an index is the main difference between a `pandas.Series` and a NumPy array.

In [5]:
# A numpy array
arr = np.random.randn(4) # random values from std normal distribution
print(type(arr))
print(arr, "\n")

# A pandas series made from the previous array
s = pd.Series(arr)
print(type(s))
print(s)

<class 'numpy.ndarray'>
[-1.41708025  0.13088292  0.19266707  0.97941122] 

<class 'pandas.core.series.Series'>
0   -1.417080
1    0.130883
2    0.192667
3    0.979411
dtype: float64


While an `np.array` is still indexable, the index is not part of the data structure and will not print as part of the `np.array`

Basic method for creating a `pandas.Series`:

s = pd.Series(data, index=index)

In [7]:
# A series from a numpy array 
pd.Series(np.arange(3), index=[2023, 2024, 2025])

2023    0
2024    1
2025    2
dtype: int64

In [8]:
# A series from a list of strings with default index
pd.Series(['EDS 220', 'EDS 222', 'EDS 223', 'EDS 242'])

0    EDS 220
1    EDS 222
2    EDS 223
3    EDS 242
dtype: object

In [9]:
# Construct dictionary
d = {'key_0':2, 'key_1':'3', 'key_2':5}

# Initialize series using a dictionary
pd.Series(d)

key_0    2
key_1    3
key_2    5
dtype: object

In [10]:
# If we only provide a single number, boolean, or string as the data for the series, 
# we need to provide an index. The value will be repeated to match the length of the index. 
# Here, we create a series from a single float number with an index given by a list of strings:
pd.Series(3.0, index = ['A', 'B', 'C'])

A    3.0
B    3.0
C    3.0
dtype: float64

In [11]:
# Define a series
s = pd.Series([98,73,65],index=['Andrea', 'Beth', 'Carolina'])

# Divide each element in series by 10
print(s /10, '\n')

# Take the exponential of each element in series
print(np.exp(s), '\n')

# Original series is unchanged
print(s)

Andrea      9.8
Beth        7.3
Carolina    6.5
dtype: float64 

Andrea      3.637971e+42
Beth        5.052394e+31
Carolina    1.694889e+28
dtype: float64 

Andrea      98
Beth        73
Carolina    65
dtype: int64


In [12]:
# We can also produce new pandas.Series with True/False values 
# indicating whether the elements in a series satisfy a condition or not:

s > 70

Andrea       True
Beth         True
Carolina    False
dtype: bool

In [13]:
# In pandas we can represent a missing, NULL, or NA value with the float value numpy.nan, 
# which stands for “not a number”. 

# Series with NAs in it
s = pd.Series([1, 2, np.nan, 4, np.nan])
s

0    1.0
1    2.0
2    NaN
3    4.0
4    NaN
dtype: float64

The hasnans attribute for a pandas.Series returns True if there are any NA values in it and false otherwise:

In [14]:
# Check if series has NAs
s.hasnans

True

After detecting there are Na values, we might be intersted in knowing which elements in the series are NAs. We can do this using the isna method:



In [15]:
s.isna()

0    False
1    False
2     True
3    False
4     True
dtype: bool

## Check-in

1. The integer number -999 is often used to represent missing values. Create a pandas.Series named s with four integer values, two of which are -999. The index of this series should be the the letters A through D.

In [18]:
s = pd.Series([26, -999, 34, -999], index = ['A', 'B', 'C', 'D'])
print(s)

A     26
B   -999
C     34
D   -999
dtype: int64


2. In the pandas.Series documentation, look for the method mask(). Use this method to update the series s so that the -999 values are replaced by NA values. HINT: check the first example in the method’s documentation.

In [23]:
# replace values where the condiiton is True
# mask(cond[,other, inplace, axis, level])
s(-999).mask('NA')

TypeError: 'Series' object is not callable