## Task 1 - The Collatz Conjecture
***

The Collatz conjecture is a famous unsolved problem in mathematics. The problem is to prove that if you start with any positive integer $x$ and repeatedly apply the function $f(x)$ below, you always get stuck in the repeating sequence 1, 4, 2, 1, 4, 2, . . .


$$ 
\\ f(x) = 
\begin{cases} 
x ÷ 2&\text{if $x$ is even}\\
3x + 1&\text{if $x$ is odd}\\
\end{cases}
$$

          



For example, starting with the value 10, which is an even number,
we divide it by 2 to get 5. Then 5 is an odd number so, we multiply by 3 and add 1 to get 16. Then we repeatedly divide by 2 to get 8, 4, 2, 1. Once we are at 1, we go back to 4 and get stuck in the repeating sequence 4, 2, 1 as we suspected.

The task is to verify, using Python, that the conjecture is true for the first 10,000 positive integers.
***

### Task 1 Introduction

There are a number of different ways to solve this problem, though it is not possible to simply create a list using range or list opperand getting the following error : "TypeError: unsupported operand type(s) for %: 'range' and 'int'". 

### Task 1  - Code

Method 1 - Ask user to input a positive integer, *<u>or</u>* randomly selecting a number between 1 and 10000. Then using 'while', 'if' and 'else' functions.

User inputting a positive integer 

In [1]:
pos_int = int(input("Enter a positive integer:"))
pos_ints=[pos_int] #pos_int will create a list

while pos_int >1:                         # while inputted positive integer is > 1
    if pos_int % 2 == 0:                  # if it is an even number
        pos_int = pos_int/2               # it is divided by 2
        pos_ints.append(int(pos_int)) 

    else:
        pos_int % 2 !=0                  # or else if it is an odd number
        pos_int = (pos_int * 3) + 1      # multiply by 3 and add 1
        pos_ints.append(int(pos_int))

for i in pos_ints:
    print(i, end = ' ')                   # print the list of numbers generated without commas or brackets


9 28 14 7 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1 

Randomly selecting a number between 1 and 10000.

In [2]:
import random

pos_int = random.randint(1, 10001)
print(pos_int)

pos_ints=[pos_int]

#for i in pos_int:
while pos_int >1: 
    if pos_int % 2 == 0:                  # if it is an even number
        pos_int = pos_int/2               # it is divided by 2
        pos_ints.append(int(pos_int))    
    else:
        pos_int % 2 !=0                  # or else if it is an odd number
        pos_int = (pos_int * 3) + 1      # multiply by 3 and add 1
        pos_ints.append(int(pos_int))


print(pos_ints, end = ' ')                   # print the list of numbers generated without commas or brackets

7442
[7442, 3721, 11164, 5582, 2791, 8374, 4187, 12562, 6281, 18844, 9422, 4711, 14134, 7067, 21202, 10601, 31804, 15902, 7951, 23854, 11927, 35782, 17891, 53674, 26837, 80512, 40256, 20128, 10064, 5032, 2516, 1258, 629, 1888, 944, 472, 236, 118, 59, 178, 89, 268, 134, 67, 202, 101, 304, 152, 76, 38, 19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1] 

Method 2 - defining the number before running the code.

In [3]:
def collatz(n):
    l = []
    l.append(n)
    while n != 1:
        if n % 2 == 0:
            n = n // 2
            l.append(n)
        else:
            n = (3*n) + 1
            l.append(n)
    return l
print (collatz(6450))

[6450, 3225, 9676, 4838, 2419, 7258, 3629, 10888, 5444, 2722, 1361, 4084, 2042, 1021, 3064, 1532, 766, 383, 1150, 575, 1726, 863, 2590, 1295, 3886, 1943, 5830, 2915, 8746, 4373, 13120, 6560, 3280, 1640, 820, 410, 205, 616, 308, 154, 77, 232, 116, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]


$\color{orange}{\text{Example using method 1:}}$ 

Number *<u>randomly</u>* selected : 6450

Output

[6450, 3225, 9676, 4838, 2419, 7258, 3629, 10888, 5444, 2722, 1361, 4084, 2042, 1021, 3064, 1532, 766, 383, 1150, 575, 1726, 863, 2590, 1295, 3886, 1943, 5830, 2915, 8746, 4373, 13120, 6560, 3280, 1640, 820, 410, 205, 616, 308, 154, 77, 232, 116, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]

$\color{orange}{\text{Example using method 2:}}$

Number selected : 6450

Output

[6450, 3225, 9676, 4838, 2419, 7258, 3629, 10888, 5444, 2722, 1361, 4084, 2042, 1021, 3064, 1532, 766, 383, 1150, 575, 1726, 863, 2590, 1295, 3886, 1943, 5830, 2915, 8746, 4373, 13120, 6560, 3280, 1640, 820, 410, 205, 616, 308, 154, 77, 232, 116, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]


### Task 1 - Conclusion

Although all iterartions were not tested in this notebook the conjecture is most probably true for the first 10,000 positive integers. Using the above methods no example of counter examples were observed and this problem has been widely studied for over 80 years and tested and confirmed as true up to 2 100000 - 1 [3].

***
## Task 2 - Variables of the "Penguin Data Set" 
***

Give an overview of the famous penguins data set, explaining the types of variables it contains. Suggest the types of variables that should be used to model them in Python, explaining your rationale.

mwaskom/seaborn-data: Data repository for seaborn examples. Oct 24 2023. 
url: https://github.com/mwaskom/seaborn-data/blob/master/penguins.csv
(visited on 24/10/2023).

In [9]:
import pandas as pd

url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv"

col_names = ["species", "island", "bill_length_mm" , "bill_depth_mm", "flipper_length_mm", "body_mass_g", "sex"]

missing_values = ["NA", "N/A"]

penguin_data = pd.read_csv(url, names=col_names, na_values=missing_values)

print(penguin_data.describe())



       species  island bill_length_mm bill_depth_mm flipper_length_mm  \
count      345     345            343           343               343   
unique       4       4            165            81                56   
top     Adelie  Biscoe           41.1            17               190   
freq       152     168              7            12                22   

       body_mass_g   sex  
count          343   334  
unique          95     3  
top           3800  MALE  
freq            12   168  


In [10]:
print(penguin_data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 345 entries, 0 to 344
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   species            345 non-null    object
 1   island             345 non-null    object
 2   bill_length_mm     343 non-null    object
 3   bill_depth_mm      343 non-null    object
 4   flipper_length_mm  343 non-null    object
 5   body_mass_g        343 non-null    object
 6   sex                334 non-null    object
dtypes: object(7)
memory usage: 19.0+ KB
None


Variables can either be described as categorical or continuous.

- Categorical variables are also known as discrete or qualitative variables. Categorical variables can be further categorized as either nominal, ordinal or dichotomous.

- Continuous variables are also known as quantitative variables. Continuous variables can be further categorized as either interval or ratio variables. [4]

 ||species |island |bill_length_mm| bill_depth_mm |flipper_length_mm |body_mass_g |  sex|
 |---|:---|---|---|---|---|---|---|
 |**variable in original data set**|object|object|object|object|object|object|object|
 |**proposed python data type**|object/string|object/string|integer|integer|integer|interger|object/string|
 |**variable type for data collection** |categorical - nominal|categorical - nominal|continuous - interval|continuous - interval|continuous - interval|continuous - interval|categorical - dichotomous|
 
 Table 1: Proposed variable types compared to original data set
  

As per the Real Python website according to the *pandas Cookbook* , the object data type is *“a catch-all for columns that pandas doesn’t recognize as any other specific type.”* In practice, it often means that all of the values in the column are strings. [5]

The "bill_depth_mm",  "flipper_length_mm" and "body_mass_g" could also be collected as floats to provide more accurate data however floats can produces rounding errors when using ptyhon for analysis. 

***
## References

[1] The Simple Math Problem We Still Can’t Solve | Quanta Magazine. Sept. 22, 2020.

https://www.quantamagazine.org/why-mathematicians-still-cant-solve-the-collatz-conjecture-20200922/ (visited on 10/10/2023).

[2] Method 2  

https://copyprogramming.com/howto/collatz-conjecture-in-python

(visited 11/10/2023)
***

[3] W. Ren, S. Li, R. Xiao and W. Bi, "Collatz Conjecture for 2^100000-1 Is True - Algorithms for Verifying Extremely Large Numbers," 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Guangzhou, China, 2018, pp. 411-416, doi: 10.1109/SmartWorld.2018.00099.

https://ieeexplore.ieee.org/document/8560077 (visited 11/10/2023)
***

[4] https://statistics.laerd.com/statistical-guides/types-of-variable.php (visited 24/10/2023)

[5] https://realpython.com/pandas-python-explore-dataset/#getting-to-know-your-data (visited 24/10/2023)