# Pandas core data structures: Series

## Creating a `pandas.Series`

- A `Series` is a one-dimensional data structure
- It has values and an index
- If you donâ€™t specify an index, pandas creates one automatically (0, 1, 2, 3)

In [39]:
"creating a series using a simple Python list"

import pandas as pd

data_list = [1, 2, 3]

print("Creating a series using a simple Python list")
data_series = pd.Series(data=data_list)

print(data_list)

[1, 2, 3]


In [43]:
"creating a series with a list and custom index"

import pandas as pd

scores = pd.Series(
    data=[88, 92, 79, 95],
    index=["Alice", "Bob", "Charlie", "Diana"],
    name="scores"
)

print("Creating a series with custom index")
print(scores)


Creating a series with custom index
Alice      88
Bob        92
Charlie    79
Diana      95
Name: scores, dtype: int64


In [42]:
"Creating a pandas series using a Python dictionary. Dictionary keys become the index."

grade_dict = {
    "Alice": 88,
    "Bob": 92,
    "Charlie": 79,
    "Diana": 95
}

scores_from_dict = pd.Series(grade_dict)

print("\nCreating a Series from a dictionary:\n")
print(scores_from_dict)



Creating a Series from a dictionary:

Alice      88
Bob        92
Charlie    79
Diana      95
dtype: int64


In [34]:
"Creating a pandas series using a 1-d numpy array"
import numpy as np

color_array = np.array(['red', 'blue', 'green'])

colors_from_array = pd.Series(data=color_array)

print("\nCreating a Series from a numpy array:\n")
print(colors_from_array)



Creating a Series from a numpy array:

0      red
1     blue
2    green
dtype: object


## Inspecting a series
value, index,

In [46]:
import pandas as pd

# Create a sample Series
scores = pd.Series(
    data=[88, 92, 79, 95],
    index=["Alice", "Bob", "Charlie", "Diana"],
    name="Exam Scores"
)

print("Inspecting the Series:\n")
print(scores)

print("\nInspecting .values (underlying data stored in the Series):")
print("scores.values =", scores.values)

print("\nInspecting .index (labels associated with each value):")
print("scores.index =", scores.index)

print("\nInspecting .dtype (data type of the values):")
print("scores.dtype =", scores.dtype)

print("\nInspecting .shape (dimensions of the Series):")
print("scores.shape =", scores.shape)

print("\nInspecting .name (optional name of the Series):")
print("scores.name =", scores.name)


Inspecting the Series:

Alice      88
Bob        92
Charlie    79
Diana      95
Name: Exam Scores, dtype: int64

Inspecting .values (underlying data stored in the Series):
scores.values = [88 92 79 95]

Inspecting .index (labels associated with each value):
scores.index = Index(['Alice', 'Bob', 'Charlie', 'Diana'], dtype='object')

Inspecting .dtype (data type of the values):
scores.dtype = int64

Inspecting .shape (dimensions of the Series):
scores.shape = (4,)

Inspecting .name (optional name of the Series):
scores.name = Exam Scores


## Accessing data

- By label: `.loc`
- By position: `.iloc`
- By Boolean condition

In [47]:
"Access by label: scores['Alice'] is equivalent to score.loc['Alice']"

print("Accessing data via label:\n")
print('score["Alice"] = ', scores["Alice"])
print('score.loc["Alice"] = ', scores.loc['Alice'])

Accessing data via label:

score["Alice"] =  88
score.loc["Alice"] =  88


In [21]:
"Access by position"

print("\nAccessing data via position:\n")
print("scores.iloc[0] = ", scores.iloc[0])



Accessing data via position:

scores.iloc[0] =  88


In [22]:
"Accessing multiple elements"

print("\nAccessing multiple values by label:\n")
print('scores[["Alice", "Diana"]] =\n', scores[["Alice", "Diana"]])

print("\nAccessing multiple values by position:\n")
print("scores.iloc[[0, 3]] =\n", scores.iloc[[0, 3]])



Accessing multiple values by label:

scores[["Alice", "Diana"]] =
 Alice    88
Diana    95
dtype: int64

Accessing multiple values by position:

scores.iloc[[0, 3]] =
 Alice    88
Diana    95
dtype: int64


In [23]:
"Boolean indexing (filtering)"

print("\nBoolean filtering:\n")
print("scores >= 90:\n", scores >= 90)

print("\nScores greater than or equal to 90:\n")
print(scores[scores >= 90])




Boolean filtering:

scores >= 90:
 Alice      False
Bob         True
Charlie    False
Diana       True
dtype: bool

Scores greater than or equal to 90:

Bob      92
Diana    95
dtype: int64


## Vectorized operations
No loops. Operations apply element-wise automatically.

In [48]:
"Arithmetic operations"

import pandas as pd

scores = pd.Series(
    [88, 92, 79, 95],
    index=["Alice", "Bob", "Charlie", "Diana"]
)

print("Original Series:\n")
print(scores)

print("\nAdding 5 points to each score (vectorized operation):")
print("scores + 5 =\n")
print(scores + 5)

print("\nMultiplying each score by 1.1 (vectorized operation):")
print("scores * 1.1 =\n")
print(scores * 1.1)


Original Series:

Alice      88
Bob        92
Charlie    79
Diana      95
dtype: int64

Adding 5 points to each score (vectorized operation):
scores + 5 =

Alice       93
Bob         97
Charlie     84
Diana      100
dtype: int64

Multiplying each score by 1.1 (vectorized operation):
scores * 1.1 =

Alice       96.8
Bob        101.2
Charlie     86.9
Diana      104.5
dtype: float64


In [49]:
"Comparison operations (returns Boolean Series). Vectorized comparisons return a Boolean Series of the same shape."

print("\nComparing each value to a condition (scores >= 90):")
print("scores >= 90 =\n")
print(scores >= 90)



Comparing each value to a condition (scores >= 90):
scores >= 90 =

Alice      False
Bob         True
Charlie    False
Diana       True
dtype: bool


In [52]:
"Operations between two Series (automatic index alignment). Pandas aligns values by index labels, not by position."

bonus = pd.Series(
    [5, 10, 0],
    index=["Alice", "Bob", "Eve"]
)

print("\nFirst Series (scores):\n")
print(scores)

print("\nSecond Series (bonus points):\n")
print(bonus)

print("\nAdding two Series together (index alignment happens automatically):")
print("scores + bonus =\n")
print(scores + bonus)



First Series (scores):

Alice      88
Bob        92
Charlie    79
Diana      95
dtype: int64

Second Series (bonus points):

Alice     5
Bob      10
Eve       0
dtype: int64

Adding two Series together (index alignment happens automatically):
scores + bonus =

Alice       93.0
Bob        102.0
Charlie      NaN
Diana        NaN
Eve          NaN
dtype: float64
If an index exists in only one Series, the result is NaN.


In [53]:
"Handling missing values in vectorized operations"

print("\nHandling missing values in vectorized operations:")
print("Notice NaN where data is missing.\n")
print(scores + bonus)

print("\nWe can use Series.add with fill_value=0")
print(scores.add(bonus, fill_value=0))



Handling missing values in vectorized operations:
Notice NaN where data is missing.

Alice       93.0
Bob        102.0
Charlie      NaN
Diana        NaN
Eve          NaN
dtype: float64

We can use Series.add with fill_value=0
Alice       93.0
Bob        102.0
Charlie     79.0
Diana       95.0
Eve          0.0
dtype: float64


## Vectorized aggregation functions



In [54]:
print("\nVectorized aggregation functions:")

print("\nMean of scores:")
print("scores.mean() =", scores.mean())

print("\nMaximum score:")
print("scores.max() =", scores.max())

print("\nStandard deviation of scores:")
print("scores.std() =", scores.std())



Vectorized aggregation functions:

Mean of scores:
scores.mean() = 88.5

Maximum score:
scores.max() = 95

Standard deviation of scores:
scores.std() = 6.95221787153807


In [55]:
"Applying NumPy-style functions"

import numpy as np

print("\nApplying NumPy functions to a Series (still vectorized):")
print("np.log(scores) =\n")
print(np.log(scores))



Applying NumPy functions to a Series (still vectorized):
np.log(scores) =

Alice      4.477337
Bob        4.521789
Charlie    4.369448
Diana      4.553877
dtype: float64


## Common Series operations


In [56]:
"Setup & Example Series"

import pandas as pd

# Categorical Series (with repeated values)
cities = pd.Series(
    ["NYC", "LA", "NYC", "Chicago", "LA", "NYC", "Boston"],
    name="City"
)

# Numeric Series
scores = pd.Series(
    [88, 92, 79, 95, 92],
    name="Scores"
)

print("Categorical Series:\n")
print(cities)

print("\nNumeric Series:\n")
print(scores)


Categorical Series:

0        NYC
1         LA
2        NYC
3    Chicago
4         LA
5        NYC
6     Boston
Name: City, dtype: object

Numeric Series:

0    88
1    92
2    79
3    95
4    92
Name: Scores, dtype: int64


In [66]:
"Previewing Data: .head() and .tail()"

print("Previewing the first 3 values using .head():")
print("cities.head(3) =\n", cities.head(3))

print("\nPreviewing the last 3 values using .tail():")
print("cities.tail(3) =\n", cities.tail(3))


Previewing the first 3 values using .head():
cities.head(3) =
 0    NYC
1     LA
2    NYC
Name: City, dtype: object

Previewing the last 3 values using .tail():
cities.tail(3) =
 4        LA
5       NYC
6    Boston
Name: City, dtype: object


In [62]:
"Aggregation Methods (Numeric Series). note: .count() ignores missing values, .size does not."

print("Aggregation methods on numeric Series:\n")

print("scores.sum()      =", scores.sum())
print("scores.mean()     =", scores.mean())
print("scores.min()      =", scores.min())
print("scores.max()      =", scores.max())
print("scores.median()   =", scores.median())
print("scores.std()      =", scores.std())
print("scores.count()    =", scores.count())
print("scores.size       =", scores.size)

Aggregation methods on numeric Series:

scores.sum()      = 446
scores.mean()     = 89.2
scores.min()      = 79
scores.max()      = 95
scores.median()   = 92.0
scores.std()      = 6.220932405998316
scores.count()    = 5
scores.size       = 5


In [65]:
"Summary Statistics of the series: .describe()"

print("Summary statistics of numeric Series:\n")
print(scores.describe())

print("\nSummary statistics of categorical Series:\n")
print(cities.describe())

Summary statistics of numeric Series:

count     5.000000
mean     89.200000
std       6.220932
min      79.000000
25%      88.000000
50%      92.000000
75%      92.000000
max      95.000000
Name: Scores, dtype: float64

Summary statistics of categorical Series:

count       7
unique      4
top       NYC
freq        3
Name: City, dtype: object


In [59]:
"Check Distinct Values: .unique() and .nunique()"
print("Finding distinct values using .unique():")
print("cities.unique() =", cities.unique())

print("\nCounting distinct values using .nunique():")
print("cities.nunique() =", cities.nunique())


Finding distinct values using .unique():
cities.unique() = ['NYC' 'LA' 'Chicago' 'Boston']

Counting distinct values using .nunique():
cities.nunique() = 4


In [60]:
"""
Frequency Tables: .value_counts()
normalize=True returns proportions instead of counts.
"""

print("Frequency table using .value_counts():\n")
print(cities.value_counts())

print("\nNormalized frequency (proportions):\n")
print(cities.value_counts(normalize=True))


Frequency table using .value_counts():

City
NYC        3
LA         2
Chicago    1
Boston     1
Name: count, dtype: int64

Normalized frequency (proportions):

City
NYC        0.428571
LA         0.285714
Chicago    0.142857
Boston     0.142857
Name: proportion, dtype: float64


In [61]:
"Sorting Values and Index"

print("Sorting values in numeric Series:\n")
print("scores.sort_values() =\n")
print(scores.sort_values())

print("\nSorting by index:\n")
print("scores.sort_index() =\n")
print(scores.sort_index())


Sorting values in numeric Series:

scores.sort_values() =

2    79
0    88
1    92
4    92
3    95
Name: Scores, dtype: int64

Sorting by index:

scores.sort_index() =

0    88
1    92
2    79
3    95
4    92
Name: Scores, dtype: int64


## Inspecting and handling missing data

In [70]:
import pandas as pd

scores = pd.Series(
    [88, 92, None, 95],
    index=["Alice", "Bob", "Charlie", "Diana"],
    name="Exam Scores"
)

print("Original Series with missing values:\n")
print(scores)

print("\nDetecting missing values using .isna():")
print(scores.isna())

print("\nYou can count missing values using .isna().sum():")
print("The total number of None value is: ", scores.isna().sum())

print("\nDetecting non-missing values using .notna():")
print(scores.notna())

print("\nDropping missing values using .dropna():")
print(scores.dropna())

print("\nFilling missing values with a constant using .fillna(0):")
print(scores.fillna(0))

print("\nFilling missing values with the mean:")
print(scores.fillna(scores.mean()))


Original Series with missing values:

Alice      88.0
Bob        92.0
Charlie     NaN
Diana      95.0
Name: Exam Scores, dtype: float64

Detecting missing values using .isna():
Alice      False
Bob        False
Charlie     True
Diana      False
Name: Exam Scores, dtype: bool

You can count missing values using .isna().sum():
The total number of None value is:  1

Detecting non-missing values using .notna():
Alice       True
Bob         True
Charlie    False
Diana       True
Name: Exam Scores, dtype: bool

Dropping missing values using .dropna():
Alice    88.0
Bob      92.0
Diana    95.0
Name: Exam Scores, dtype: float64

Filling missing values with a constant using .fillna(0):
Alice      88.0
Bob        92.0
Charlie     0.0
Diana      95.0
Name: Exam Scores, dtype: float64

Filling missing values with the mean:
Alice      88.000000
Bob        92.000000
Charlie    91.666667
Diana      95.000000
Name: Exam Scores, dtype: float64
