# Day X: Session Y - Arrays and Series

[Link to session webpage](https://eds-217-essential-python.github.io/course-materials/interactive-sessions/3c_arrays_and_series.html)


In [1]:
import numpy as np
import pandas as pd

## Arrays

array: a collection of items that are ordered. Array is specific to numpy. list is general

numpy arrays are much more compact, much less space

Data frames can be coerced into np arrays for calculation

### Create numpy arrays

In [7]:
# create a 1D array (which is a flat list)
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)
print(my_array) # no commas, mostly for visual distinguishing and usually working w large lists
print(my_list)
type(my_array)

[1 2 3 4 5]
[1, 2, 3, 4, 5]


numpy.ndarray

In [10]:
# create a 2D array 
my_lists = [[1,2,3], [4,5,6]] # a list of two separate lists
my_arrays = np.array(my_lists)
print(my_arrays)
print(type(my_arrays))

[[1 2 3]
 [4 5 6]]
<class 'numpy.ndarray'>


### Basic Array Operations

`min`, `max`, `mean`, etc.

The objects have better ways of doing things than the global namespace!

The core difference of an object-oriented language

Transformations are way easier with arrays, multiply, add, etc.

Arrays have indeces into position, they don't have labels. can only every be a number

In [27]:
# global min max are still around
min(3,4,5)

# But arrays use their own methods for basic operations
np.min(my_array)
print("np.min", np.min(my_array)) # this also works instead of f string
print(f"np.min of {my_array} is {np.min(my_array)}") # format string allows you to use variables in a print statement

# max
print(f"np.max of {my_array} is {np.max(my_array)}")

# mean
print(f"np.mean of {my_array} is {np.mean(my_array)}")

# standard deviation
print(f"np.std of {my_array} is {np.std(my_array):.2f}") # :.2f shortens to two decimal places
# If you actually wanted to change the number you'd have to round it. This just changes visual output

# array transformations
# multiply every element in an array by 2
print(my_array * 2)
print(my_list * 2)

# add 2 to every element in an array
print(my_array +2)

np.min 1
np.min of [1 2 3 4 5] is 1
np.max of [1 2 3 4 5] is 5
np.mean of [1 2 3 4 5] is 3.0
np.std of [1 2 3 4 5] is 1.41
[ 2  4  6  8 10]
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
[3 4 5 6 7]


## Using Panda Series

A series is a 1D labelled array

### Creating a series
We create them using pd.Series() command, constructor function. It's an object bc it's capital S

Once we have arrays of info with indeces we can control, we can build as complex of stuff as we want


In [31]:
# Create a Series from a list
s1 = pd.Series([1, 2, 3, 4, 5])
print("Series from list:\n", s1)

# Create a Series with custom index
s2 = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])
print("\nSeries with custom index:\n", s2)
# changed the labels! (indeces)

Series from list:
 0    1
1    2
2    3
3    4
4    5
dtype: int64

Series with custom index:
 a    10
b    20
c    30
d    40
e    50
dtype: int64


In [34]:
s2['a'] # can index by label or index value (a or 0)

10

### Basic series operations

#### Accessing and slicing

Everything we've learned about slicing works in these series and arrays

In [37]:
# access based on the index value
print("The value at index 'c':", s2['c'])

# access based on location (like a list index position)
print("The value of the first three elements:\n", s2[:3])

The value at index 'c': 30
The value of the first three elements:
 a    10
b    20
c    30
dtype: int64


#### Arithmetic and Stats

Same as with arrays

In [46]:
# add 5 to every element
print(s2 + 5)

# calculate mean
print(s2.mean())

# standard deviation
print(f"Standard deviation: {s2.std():.2f}")

# median
print(s2.median())

a    15
b    25
c    35
d    45
e    55
dtype: int64
30.0
Standard deviation: 15.81
30.0


In [51]:
#help(s2.mean) # don't evaluate the function!!
# dont do help(s2.mean()), just pass the function, not evaluate it