# Python Libraries and Numpy

In this notebook, we will take a look at Python libraries in general and a specific library
that is useful in data science, called `numpy`.

## Importing Libraries

A library is a collection of functions and tools that typically serve a related purpose. They
save you from having to write a lot of code yourself. In Python there are libraries, modules and
packages, which all serve the same general role but are technically different. However, for ease of
discussion, we will use the term library to cover all of these.

To use a library, you must first import it using the `import` keyword. You can also use the `as` keyword to provide an alias, for example if you don't want to have to type a long library name repeatedly.
In the example below, we import the library `my_library` using the alias `ml`.
```
    import my_library as ml
```
## Numpy

The `numpy` library (pronounced num - pie) is a library of numerical functions and tools that are useful in scientific computing, data science and machine learning.

Let's import it and use some of its features. We will start by looking at what you can do with numpy arrays, which are basically lists of numbers (or some other type of data).

In [2]:
# Import the numpy library and refer to it by the name 'np'
import numpy as np

# Create a numpy array from a list
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Print the array
print(x)

[ 1  2  3  4  5  6  7  8  9 10]


## Numpy Array Properties

An array is basically a list of items, similar to Python lists and tuples. One key feature of `numpy` arrays is that the items must all be the same data type, for example `int` or `float`.

Arrays can be one-dimensional like a list:
```
1 2 3 4 5
```
or they can be multi-dimensional, for example with rows and columns.
```
1 2 3 4 5
0 2 4 6 8
```
The code below gives us information about the properties of our array `x`.

In [3]:
# The size property tells you how many items there are in total.
print(x.size)

# The ndim property tells you how many dimensions - in our case 1.
print(x.ndim)

# The shape property tells you the shape in terms of rows and columns. Because our example is one row, we get the size with no other number.
print(x.shape)

10
1
(10,)


## Summarizing Arrays

We can easily get summary values for an array using the followin methods:

- `x.mean()` returns the numerical mean (average) for all the values in the array `x`
- `x.max()` returns the maximum value
- `x.min()` returns the minimum
- `x.std()` returns the standard deviation (a measure of spread)

Run the code below to see the value of these for our array `x`.

In [4]:
# Calculate some statistics - average value, standard deviation, max value, min value
print(f"The mean (average) is {x.mean():.2f}")
print(f"The standard deviation is {x.std():.2f}")
print(f"The max value is {x.max():.2f}")
print(f"The min value is {x.mean():.2f}")


The mean (average) is 5.50
The standard deviation is 2.87
The max value is 10.00
The min value is 5.50


## Modifying Arrays

With `numpy`, it is easy to change all the values in an array with a line of code. Some examples are shown below.

In [5]:
# Increase every value by 1
print("Original array:", x)
x = x + 1
print("Array with values increased by 1:",x)

Original array: [ 1  2  3  4  5  6  7  8  9 10]
Array with values increased by 1: [ 2  3  4  5  6  7  8  9 10 11]


In [6]:
# Multiply every value by 3 and add 1
print(x)
x = 3*x + 1
print(x)

[ 2  3  4  5  6  7  8  9 10 11]
[ 7 10 13 16 19 22 25 28 31 34]


In [7]:
# Take the square root of each number in the array
print(np.sqrt(x))

[2.64575131 3.16227766 3.60555128 4.         4.35889894 4.69041576
 5.         5.29150262 5.56776436 5.83095189]


## Exercise 1

Copy and paste the following list called `num_list` into the code block below. Then answer the questions below.
```
num_list = [3, -2, 7, 10, 24, -15]
```
1. Use the code `y = np.array(num_list)` to store the list in a numpy array called `y`.
2. Print the size, number of dimensions and shape of the array `y`.
3. Print the maximum and minimum values of the array.
4. Create a new array called `z` who values are equal to 5 times the values in the array `y`.
5. Print the mean of all the values in the array `z`.

In [None]:
# Type your code here


## Combining and Slicing Arrays

It is easy to combine two or more numpy arrays. You can also use indexing to reference parts of an array. This very similar to the indexing we've used to reference parts of a string or list.

Run the code below to see some examples.

In [8]:
# Create two different numpy arrays called p and q
p = np.array([1, 2, 3, 4, 5])
q = np.array([0, 4, 5, 3, -1])

#Add the elements of the arrays together
print(p+q)

# Combine the elements by adding 2 times the first array to 3 times the second array value
print(2*p+3*q)

# Slice the array q so that we keep the first 3 elements
r = q[0:3]
print(r)

[1 6 8 7 4]
[ 2 16 21 17  7]
[0 4 5]


## Exercise 2

Here are two numpy arrays:
```
arr1 = [5, 7, -2, 9, 10, 12]
arr2 = [-3, 4, 8, 11, 0, -5]
```

Write the code to create a third numpy array, `arr2` which is composed of the sum of the elements of `arr1` and `arr2` but only include the first four elements, dropping the last two.

In [9]:
# Type your code here


## Loading Data from a File

You can load data from a CSV (Comma Separated Variable) file and store it in a numpy array.

Let's load some test scores from the file `test_scores.csv`. This file has 3 rows of data, each representing
scores on 3 different tests for a class of 12 students. The first test was out of 10, the second out of 20 and the third out of 100. Each column gives the scores for a particular student.

In [10]:
file = "test_scores.csv"
d = np.loadtxt(file, delimiter=",")
print(d)

[[ 1.  3.  3.  9.  1. 10.  9.  7.  7.  4. 10.  9.]
 [ 9. 20. 13.  7.  1. 13. 14.  1. 16. 18.  9.  4.]
 [97. 75.  5. 72. 76. 24. 12. 94. 67. 21. 85. 67.]]


In [11]:
# The shape of the array d will tell you there are 3 rows each with 12 columns.
d.shape

(3, 12)

In [19]:
# We'll put each row in its own array corresponding to the test number
test1 = d[0]
test2 = d[1]
test3 = d[2]
print("Test 1 results:", test1)
print("Test 2 results:", test2)
print("Test 3 results:", test3)

Test 1 results: [ 1.  3.  3.  9.  1. 10.  9.  7.  7.  4. 10.  9.]
Test 2 results: [ 9. 20. 13.  7.  1. 13. 14.  1. 16. 18.  9.  4.]
Test 3 results: [97. 75.  5. 72. 76. 24. 12. 94. 67. 21. 85. 67.]


## Some Data Processing

Let's do some data processing on these test scores. 

1. Firstly, we want to remove the last two students from each set of scores as they dropped the class.
2. Secondly, we want to rescale the first two scores so they are out of 100.
3. Thirdly, we will add scores together so we get a total score for each student over the 3 tests.
4. Finally, we will divide all the scores by 3 so we get an average out of 100.


In [20]:
# Drop the last two columns from each test and print the results
test1 = test1[0:10]
test2 = test2[0:10]
test3 = test3[0:10]
print("Test 1 results:", test1)
print("Test 2 results:", test2)
print("Test 3 results:", test3)

Test 1 results: [ 1.  3.  3.  9.  1. 10.  9.  7.  7.  4.]
Test 2 results: [ 9. 20. 13.  7.  1. 13. 14.  1. 16. 18.]
Test 3 results: [97. 75.  5. 72. 76. 24. 12. 94. 67. 21.]


In [21]:
# Rescale tests 1 and 2 so they are out of 100 and print the result
test1 = test1*10
test2 = test2*5
print("Test 1 results:", test1)
print("Test 2 results:", test2)
print("Test 3 results:", test3)

Test 1 results: [ 10.  30.  30.  90.  10. 100.  90.  70.  70.  40.]
Test 2 results: [ 45. 100.  65.  35.   5.  65.  70.   5.  80.  90.]
Test 3 results: [97. 75.  5. 72. 76. 24. 12. 94. 67. 21.]


In [22]:
# Get a total score for each student
total = test1+test2+test3
print(total)

[152. 205. 100. 197.  91. 189. 172. 169. 217. 151.]


In [23]:
# Get an average score for each of the students
total_average = total/3
print(total_average)

[50.66666667 68.33333333 33.33333333 65.66666667 30.33333333 63.
 57.33333333 56.33333333 72.33333333 50.33333333]


In [24]:
# We don't like all the decimals, so we will round each score to 1 decimal place
tot_ave_rnd = np.around(total_average, decimals=1)
print(tot_ave_rnd)

[50.7 68.3 33.3 65.7 30.3 63.  57.3 56.3 72.3 50.3]


In [25]:
# Finally, let's work out the highest, lowest and average total score for the 10 students
print(tot_ave_rnd.min())
print(tot_ave_rnd.max())
print(tot_ave_rnd.mean())

30.3
72.3
54.75


## Exercise 3

We'll use what we have learned to analyze some real data. The file "Temp_Data_Denver_December_2025.csv" contains the daily high and low temperatures for Denver for the month of December 2025.
