## Importing Libaries


In [4]:
# Numerical arrays.
import numpy as np

# Data frames.
import pandas as pd

## Task 1: The Collatz conjecture
***
The Collatz Conjecture is a famous unsolved problem in mathematics. The problem is to prove that if you start with any positive integer x and repeatedly apply the function f(x) below, you always get stuck in the repeating sequence 1, 4, 2, 1, 4, 2...

In [44]:
def f(x):
    # If x is even, divide it by two
    if x % 2 == 0:
        return x // 2
    # If x is odd, multiply by 3 and add 1
    else:
        return 3 * x + 1

In [45]:
def collatz(x):
    sequence = []
    while x != 1:
        sequence.append(x)
        x = f(x)
    sequence.append(1)  # Add 1 as the end of the sequence
    return sequence

In [48]:
def collatz_test(limit):
    for i in range(1, limit + 1):
        sequence = collatz(i)
        if 1 not in sequence:
            return False, i  # Shows if it fails and the failing number
    return True, None

In [49]:
# Check our test for the first 10,000 numbers
result, failing_number = collatz_test(10000)

if result:
    print("Yes, the Collatz conjecture holds true for the first 10,000 positive integers. This means that if you start with any one of the first 10,000 positive integers and apply the Collatz transformation repeatedly, you will eventually reach the number 1.")
else:
    print(f"The Collatz Conjecture fails at integer: {failing_number}.")

Yes, the Collatz conjecture holds true for the first 10,000 positive integers. This means that if you start with any one of the first 10,000 positive integers and apply the Collatz transformation repeatedly, you will eventually reach the number 1.


***

## End

## Task 2: Penguins Dataset

### Overview
The Penguins Dataset is a well known  and valuable resource within the data analysis and machine learning communities. It comprises a collection of data on penguin measurements from the Palmer Archipelago in Antarctica. This dataset is integral for a variety of research and educational purposes, ranging from ecological studies to introductory data science courses.

### Selection of Variables in the Dataset

#### Categorical Variables
Representing nominal scale variables, these include:
- Species: Data on three species (Chinstrap, Adélie, and Gentoo), each with distinct physical characteristics.
- Island: Information from three islands (Dream, Torgersen, and Biscoe) in the Palmer Archipelago.
- Sex: Differentiating between male and female penguins, with notable size differences.

In Python, these are optimally represented as `category` data types for efficient memory usage and performance (in compaison `object` would instead be typically used for strings). As such when modelling the categorical variables in the Penguin Dataset we are going to convert and use `category` data types. This helps provide semantic meaning to the data and imporves operations such as grouping and sorting. 

#### Numerical Variables
Comprising interval or ratio scale variables, these include:
- Bill Length and Depth: Measuring the culmen length (from the tip of the beak to the base of the nasal bone) and depth.
- Flipper Length: Distance from the flipper tip to the body joint.
- Body Mass: The weight of penguins in grams.

In Python, these variables are best represented as `float64` or `int64` data types, depending on the precision required. `float64`would be more representative in the Penguins Dataset as this would be the standard for floating-point numbers in Python. Futhermore this allows a a good balance between memory usage and precision in mathematical computations.  

### Applications and Significance
The dataset is extensively used to:
- Investigate relationships between species and physical traits.
- Analyse distribution across islands.
- Study gender-based physical differences.
- Develop predictive models for body mass.
- Track population trends.

### References
The dataset is attributed to Allison M. Horst, A. P. Hill, and K. B. Gorman (2020) in their compilation 'palmerpenguins: Palmer Archipelago (Antarctica) penguin data' (R package version 0.1.0). Further, Gorman, Williams, & Fraser’s 2014 study in PLoS ONE provides foundational data for this dataset.

In [28]:
# Importing libaries
import pandas as pd

# Importing the Penguins Dataset using CSV provided on ATU Moodle
url = "https://github.com/nf-me/fundamentals_dataanalysis/blob/master/data/penguins.csv?raw=true"
penguins_df = pd.read_csv(url)

# Displaying the initial data types
print("\nInitial Data Types:")
print(penguins_df.dtypes)

# Converting categorical variables to 'category' type
convert_category = ['species', 'island', 'sex']
for column in convert_category:
    penguins_df[column] = penguins_df[column].astype('category')

# Verifying the changes in data types
print("\nData Types after Conversion:")
print(penguins_df.dtypes)


Initial Data Types:
species               object
island                object
bill_length_mm       float64
bill_depth_mm        float64
flipper_length_mm    float64
body_mass_g          float64
sex                   object
dtype: object

Data Types after Conversion:
species              category
island               category
bill_length_mm        float64
bill_depth_mm         float64
flipper_length_mm     float64
body_mass_g           float64
sex                  category
dtype: object


***

## End

## Task 3: [enter title]
***
Brief description of the tasks here

## Task 4: [enter title]
***
Brief description of the tasks here

## Task 5: [enter title]
***
Brief description of the tasks here