# Fundamentals of Data Analysis Tasks

**Linda Grealish**

***

## Table of contents
 * [Task 1](#task-1)
 * [Task 2](#task-2)

## Task 1

The Collatz conjecture is a famous unsolved problem in mathematics. The problem is to prove that if you start with any positive integer $x$ and repeatedly apply the function $f(x)$ below, you always get stuck in the repeating sequence 1, 4, 2, 1, 4, 2, . . .

$$ f(x) = \begin{cases}
x\div 2 & \text{if $x$ is even} \\
3x+1 & \text{otherwise}
\end{cases} $$

For example, starting with the value 10, which is an even number , we divide it by 2 to get 5.  Then 5 is an odd number so, we multiply by 3 and add 1 to get 16.  Then we repeatedly divide by 2 to get 8, 4, 2, 1. Once we are at 1, we go back to 4 and get stuck in the repeating sequence 4,2,1 as we suspected.

Your task is to verify, using Python, that the conjecture is true for the first 10000 positive integers.

First we will define the function $f(x)$  that takes in a positive integer $x$.  The function checks if the integer is odd or even by using the modulo operator %.  If it is positive it divides x by 2 using floor division // and if it is odd then it returns the result of (3 * x) + 1. 

In [None]:
def f(x):
  # If x is even, divide it by 2.
  if x % 2 == 0:
    return x // 2
  else:
    return (3 * x) + 1 

Next we will define the function collatz().  This is done with the use of a *while* loop which runs until we achieve a value of 1 and the result is a comma seperated list of numbers. This while loop contains conditional statements if and else that signify what action should happen depending on whether the number is odd or even.

In [None]:
def collatz(x):
  print(f'Testing Collatz with initial value {x}')
  while x !=1:
    print (x, end=', ')
    x = f(x)
  print(x)


*collatz()* is a Python function that allow the user to input a specific integer and prints out the result as a comma seperated list.  The results of the Collatz sequence for 1000 is shown below.  This can be run for any positive integer by simply changing the value in the parenthesis below *collatz(number)*

In [None]:
collatz(1000)

To satisfy the second part of the task we will define a function *collatz_sequence* that reads the starting integer and runs the collatz conjecture with all the results at each iteration appended to a list.

In [None]:
def collatz_sequence(x):
    sequence = [x]
    while x != 1:
        if x % 2 == 0:
            x = x // 2
        else:
            x = 3 * x + 1
        sequence.append(x)
    return sequence

The final part of this uses the *range* function which takes all of the integers between 1 and 10,000 inclusive and applies the *collatz_sequence(x)* to each integer.

In [None]:
for number in range(1, 10001):
    print(f"collatz_sequence {number} is {collatz_sequence(number)}")

If we run the above code we can see that for each of the integers from 1 to 10,000 (inclusive) we can see that the Collatz Sequence is true for each integer.

***

## Task 2

Give an overview of the famous penguins data set, explaining the types of variables it contains. Suggest the types of variables that should be used to model them in Python, explaining your rationale.

Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.

The palmerpenguins package contains two datasets.
https://github.com/mwaskom/seaborn-data/blob/master/penguins.csv


<img src = https://github.com/allisonhorst/palmerpenguins/raw/main/man/figures/lter_penguins.png alt = "Image of Palmer Penguins"

***
## End