# Numpy Exercise

## Instructions
The same instructions from previous exercises apply.

Write the Python program in the cell below the text of the exercise (or create a new one). Always print the final result to the screen to verify the correctness of the exercise. Sometimes, we provide some important concepts in a code cell below the exercise text, try running it and modify it if necessary to ensure you understand the required concepts.

## Submission
The same rules from previous exercises apply.

It is mandatory to **submit the solution for all exercises** (except for those marked as optional) **before the beginning of the next lesson** in the appropriate assignment on iCorsi. To submit:
- Run the entire notebook from scratch (`Kernel -> Restart & Run All`) and ensure that the solutions are as expected;
- Export the notebook in HTML format (`File -> Download as...`) and submit the resulting file.

If you were unable to complete one or more exercises, describe the problem encountered and **still submit the file with the rest of the solutions**.


In [66]:
# Import packaged for later usage
import pandas as pd
import numpy as np
import os

In [67]:
# Solution
array = np.arange(10, 50)
print(array)

[10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49]


In [81]:
matrice = np.random.rand(5,5)
print(matrice)

[[0.0586875  0.10049055 0.95489248 0.33559864 0.35627031]
 [0.4164043  0.56966898 0.79206862 0.43001347 0.19752532]
 [0.53153703 0.44525951 0.81477883 0.25742078 0.36499627]
 [0.90184671 0.19988151 0.26860622 0.26181502 0.63191188]
 [0.7619642  0.26787676 0.25000589 0.65620728 0.38864522]]


In [82]:
min = np.min(matrice)
max = np.max(matrice)
print(f"Min: {min}, Max: {max}")

Min: 0.058687501839863154, Max: 0.9548924799823754


In [83]:
frame = np.ones((5, 5))
frame[1:-1, 1:-1] = 0
print(frame)

[[1. 1. 1. 1. 1.]
 [1. 0. 0. 0. 1.]
 [1. 0. 0. 0. 1.]
 [1. 0. 0. 0. 1.]
 [1. 1. 1. 1. 1.]]


In [84]:
def zeroFrame(arr, n=2):
    tmp = np.zeros(len(arr) + n)
    tmp[int(n/2):-int(n/2)] = arr
    return tmp
    
arr = zeroFrame(array)
print(arr)

[ 0. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.
 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44.
 45. 46. 47. 48. 49.  0.]


In [85]:
checkerboard = np.zeros((8, 8), dtype=bool)
checkerboard[1::2, ::2] = True
checkerboard[::2, 1::2] = True
print(checkerboard)
print()

[[False  True False  True False  True False  True]
 [ True False  True False  True False  True False]
 [False  True False  True False  True False  True]
 [ True False  True False  True False  True False]
 [False  True False  True False  True False  True]
 [ True False  True False  True False  True False]
 [False  True False  True False  True False  True]
 [ True False  True False  True False  True False]]



In [86]:
def negate_between_5_and_8(arr):
    arr = np.array(arr)
    mask = (arr >= 5) & (arr <= 8)
    arr[mask] = -arr[mask]
    return arr
print(np.arange(1,11))
print(negate_between_5_and_8(np.arange(1,11)))

[ 1  2  3  4  5  6  7  8  9 10]
[ 1  2  3  4 -5 -6 -7 -8  9 10]


In [87]:
def find_closest_index(arr, x):
    arr = np.array(arr)
    index = np.argmin(np.abs(arr - x))
    return index

# Example usage
arr = [-1, 3, 6, 9, 5]
closest_index = find_closest_index(arr, 2)
print(closest_index)

1


## Exercise 1

Let's consider the online dating profiles dataset, which we mentioned in class.

Download the zip files from [this link](https://github.com/rudeboybert/JSE_OkCupid), unzip them, and place the files `profiles_revised.csv` and `essays_revised_and_shuffled.csv` in a `data` subdirectory in the current directory. Then, run the following cell, which uses the pandas library (which we will cover in detail later) to parse the CSV.

The current directory, where we expect to find the file, is this:

In [91]:
print(os.getcwd()+"/data")

/Users/lucamazza/School/datascience/data


In [92]:
df = pd.read_csv("data/profiles_revised.csv")
age = df["age"].values
sex = df["sex"].values

dfe = pd.read_csv("data/essays_revised_and_shuffled.csv")
essay = dfe["essay0"].values

### 1.1

Read the [codebook](https://github.com/rudeboybert/JSE_OkCupid/blob/master/okcupid_codebook_revised.txt) for the dataset, focusing on the three columns "age", "sex", and "essay0".

Explain the type of the `age`, `sex`, and `essay` arrays; visualize some elements from them.

In [99]:
# Solution 1.1
print(age)
print(sex)


[22 36 37 ... 41 26 40]
['m' 'm' 'm' ... 'm' 'm' 'm']


### 1.2

Analyze the arrays with the numpy methods you know and answer the following questions:
- What is the average, minimum, and maximum age of the users?
- How many are male, and how many are female?
- Does the average age of males differ significantly from the average age of females?
- How long are their introductions on average?
- Show the longest introduction.
- Can we determine if males tend to write more compared to females?

In [None]:
# Solution 1.2


## Exercise 2

### 2.1
Write a function `primeNumbers(N)` that takes as an (optional) input argument an integer `N` (default value `N=1000`) and returns an array with all prime numbers less than or equal to `N` (neither 0 nor 1 should be considered).

In [None]:
# Solution 2.1


### 2.2
Write a function `createMyDictionary(N)` that takes as an (optional) input argument an integer `N` (default value `N=1000`) and returns a dictionary with two elements. The first element has the key "middle", associated with an array of all prime numbers between $\frac{1}{4}N$ e $\frac{3}{4}N$ (inclusive). The second element has the key "extremes" and contains the remaining prime numbers less than $N$. Hint: Use the previously created function `primeNumbers(N)`.

In [None]:
# Hint

# Remember that the logical AND (or OR) between 2 Boolean variables (or Boolean arrays)
# is performed in Python using the operator `&` (or `|` respectively)

array1 = np.array(np.arange(10))
print(f"array1: {array1}")

mask1 = (array1 > 2) & (array1 < 8)
print("mask1: ", mask1)

mask2 = (array1[0:3] == 2) | (array1[0:3] == 1)
print("mask2: ", mask2)

In [None]:
# Solution 2.2
