# Numpy Exercise

## Instructions
The same instructions from previous exercises apply.

Write the Python program in the cell below the text of the exercise (or create a new one). Always print the final result to the screen to verify the correctness of the exercise. Sometimes, we provide some important concepts in a code cell below the exercise text, try running it and modify it if necessary to ensure you understand the required concepts.

## Submission
The same rules from previous exercises apply.

It is mandatory to **submit the solution for all exercises** (except for those marked as optional) **before the beginning of the next lesson** in the appropriate assignment on iCorsi. To submit:
- Run the entire notebook from scratch (`Kernel -> Restart & Run All`) and ensure that the solutions are as expected;
- Export the notebook in HTML format (`File -> Download as...`) and submit the resulting file.

If you were unable to complete one or more exercises, describe the problem encountered and **still submit the file with the rest of the solutions**.


In [1]:
# Import packaged for later usage
import pandas as pd
import numpy as np
import os

## Exercise 0
Various warmup exercises

- Create a 1D array with values ranging from 10 to 49 (use `np.arange`, find its docs!)
- Create a 5x5 array with random values and find the minimum and maximum values (use `np.random.rand`, `np.min`, `np.max`)
- Create a 2D array with 1 on the border and 0 inside
- Write a function that takes a 1D array as input, and returns a new array that is the same as the previous, but includes a 0 at the first and last element.  E.g. `[1,2,3]` becomes `[0,1,2,3,0]`
- Same as above, but take a parameter `n` which indicates how many zeros to place.
- Create an 8x8 boolean array and fill it with a checkerboard pattern
- Given a 1D array of numbers, return a new array that contains the same values, except that all values between 5 and 8 (inclusive) will be negated.  E.g. `[-1,3,6,9,5]` becomes `[-1,3,-6,9,-5]`
- Given a 1D array of numbers `a` and a number `x`, return the index of the element in `a` that is closest to `x`. E.g. `a=[-1,3,6,9,5], x=2` returns `1`, because `a[1]=3` is closest to `2`.

In [None]:
# Solution 0


## Exercise 1

Let's consider the online dating profiles dataset, which we mentioned in class.

Download the zip files from [this link](https://github.com/rudeboybert/JSE_OkCupid), unzip them, and place the files `profiles_revised.csv` and `essays_revised_and_shuffled.csv` in a `data` subdirectory in the current directory. Then, run the following cell, which uses the pandas library (which we will cover in detail later) to parse the CSV.

The current directory, where we expect to find the file, is this:

In [None]:
print(os.getcwd()+"/data")

In [5]:
df = pd.read_csv("data/profiles_revised.csv")
age = df["age"].values
sex = df["sex"].values

dfe = pd.read_csv("data/essays_revised_and_shuffled.csv")
essay = dfe["essay0"].values

### 1.1

Read the [codebook](https://github.com/rudeboybert/JSE_OkCupid/blob/master/okcupid_codebook_revised.txt) for the dataset, focusing on the three columns "age", "sex", and "essay0".

Explain the type of the `age`, `sex`, and `essay` arrays; visualize some elements from them.

In [None]:
# Solution 1.1


### 1.2

Analyze the arrays with the numpy methods you know and answer the following questions:
- What is the average, minimum, and maximum age of the users?
- How many are male, and how many are female?
- Does the average age of males differ significantly from the average age of females?
- How long are their introductions on average?
- Show the longest introduction.
- Can we determine if males tend to write more compared to females?

In [None]:
# Solution 1.2


## Exercise 2

### 2.1
Write a function `primeNumbers(N)` that takes as an (optional) input argument an integer `N` (default value `N=1000`) and returns an array with all prime numbers less than or equal to `N` (neither 0 nor 1 should be considered).

In [None]:
# Solution 2.1


### 2.2
Write a function `createMyDictionary(N)` that takes as an (optional) input argument an integer `N` (default value `N=1000`) and returns a dictionary with two elements. The first element has the key "middle", associated with an array of all prime numbers between $\frac{1}{4}N$ e $\frac{3}{4}N$ (inclusive). The second element has the key "extremes" and contains the remaining prime numbers less than $N$. Hint: Use the previously created function `primeNumbers(N)`.

In [None]:
# Hint

# Remember that the logical AND (or OR) between 2 Boolean variables (or Boolean arrays)
# is performed in Python using the operator `&` (or `|` respectively)

array1 = np.array(np.arange(10))
print(f"array1: {array1}")

mask1 = (array1 > 2) & (array1 < 8)
print("mask1: ", mask1)

mask2 = (array1[0:3] == 2) | (array1[0:3] == 1)
print("mask2: ", mask2)

In [None]:
# Solution 2.2
