# Fundamentals of Data Analysis Tasks Notebook

Sean Humphreys

----

## Task 1 - Collatz Conjecture
----

Task 1 - *Verify, using Python, that the conjecture is true for the first 10,000 positive integers.*

Often thought of as the simplest impossible problem the Collatz Conjecture asks whether repeating two simple arithmetic operations will eventually transform every positive integer into 1.

$f(n) = \begin{cases} n/2 &\text{if } n = \text{even} \\
(3n +1)/2 & \text{if } n = \text{odd} \end{cases}$


**Assumptions**
- If the Collatz conjecture for a positive integer holds true all numbers will end in an infinite loop of 4, 2, 1 - this includes the numbers 1 and 2.
- The script should indicate to the end user if the Collatz Conjecture holds true for the desired range of positive integers.
- If the Collatz conjecture does not hold true for any values, the script should return the values that do not hold true.
- For usability purposes the script should be able to handle any range of input values not just the first 10,000 positive integers.
- The script will not run if negative integers, floating point decimals or strings are entered as arguments.
- If these values are entered as arguments the script will give clear instruction to the end user of the correct format for arguments to be entered.

### Step 1 - Test Collatz Conjecture

Define a function to test the Collatz conjecture. 

This function is based on a code snippet accessible [here](https://www.educative.io/answers/how-to-generate-the-collatz-sequence-in-python) (last accessed 06/10/2023). 

The purpose of the *collatz(number)* function is to return the Collatz sequence, in a list, for a given positive integer. If the Collatz Conjecture does not hold true for that number a list containing the given number is returned. To achieve this the Collatz sequence for the given number is tested against a list that contains the numbers 4, 2, 1 - as per the assumption that every Collatz sequence must end in a 4, 2, 1 loop.

The code snippet upon which the function is based calculates the the Collatz sequence for the number 1 as [1] and number 2 as [2, 1]. This creates an issue in so far as the sequence for the numbers 1 and 2 cannot be tested against the [4, 2, 1] loop. To allow for this, an *if* and *elif* statement are used to return the Collatz sequence for the numbers 1 and 2 that include the 4, 2, 1 loop. All other values passed as an argument to the function are divided by 2 if even or if odd they multiplied by 3 and 1 is added to this number. The returned value is appended to a list. A *while* loop is employed to continue this until the output of these mathematical operations reaches 1.

The *collatz_sequence_list* generated by the while loop is tested for Collatz compliance by comparing the last three numbers in this list against the *loop_list*. If false an *if* statement returns the *wrong_list*. If true, the function returns the *collatz_sequence_list*.

In [1]:
def collatz(number):
    collatz_sequence_list = []
    wrong_list = []
    loop_list = [4, 2, 1]
    if number == 1:
        # to test 1 the 4,2,1 loop needs to be included to test against
        collatz_sequence_list = [1, 4, 2, 1]
    elif number == 2:
        # to test 2 the 4,2,1 loop needs to be included to test against
        collatz_sequence_list = [2, 4, 2, 1]
    else:
        collatz_sequence_list.append(number)
        while (number != 1):
            if (number % 2 == 0):
                number = number//2
                collatz_sequence_list.append(number)
            else:
                number = number*3+1
                collatz_sequence_list.append(number)
    if loop_list != collatz_sequence_list[-3:]:
        wrong_list.append(collatz_sequence_list[0])
        return wrong_list
    else:
        return collatz_sequence_list

The output of the `collatz(number)` function is tested below. For the purpose of this test, number = 57.

In [2]:
print(collatz(57))

[57, 172, 86, 43, 130, 65, 196, 98, 49, 148, 74, 37, 112, 56, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]


### Step 2 - Test a Range of Input Values

Define a function to test a range of values. 

Using a *for* loop, every value in a given range of values can be inputted into the into the `collatz(number)` function. There are two possible outcomes that must be accounted for:
1. The number does not satisfy the Collatz conjecture. In this outcome a list 1 number long is returned.
2. The number does satisfy the Collatz conjecture. In this outcome a list longer than 1 number is returned.

Each of these outcomes is handled separately. Two functions are defined to achieve this.

1. The `wrong_list(lower, upper)` function uses a *for* loop to check the length of the list returned from the `collatz(number)` function for every value in the desired range. If the length of the returned list is 1, i.e. the Collatz conjecture does not hold true for that number, that number is appended to a list and returned by the function on completion of the *for* loop.

In [3]:
def wrong_list(lower, upper):
    wrong_list = []
    for i in range(lower, upper):
        if len(collatz(i)) == 1:
            wrong_list.append(i)
    return (wrong_list)

2. The `right_list(lower, upper)` function uses a *for* loop to check the length of the list returned from the `collatz(number)` function for every value in the desired range. If the length of the returned list is greater than 1, i.e. the Collatz conjecture is true for that number, that number is appended to a list and the returned by the function on completion of the *for* loop.

In [4]:
def right_list(lower, upper):
    right_list = []
    for i in range(lower, upper):
        if len(collatz(i)) > 1:
            right_list.append(i)
    return (right_list)

### Step 3 - Tell the User the Result

Define a function to tell the user the vales for which the Collatz conjecture does or does not hold true for in a given range.

The input arguments for this function are the lower and upper values for the range of values to be tested.

4 variables are defined:
1. wrong_list_var = a list containing the output of the `wrong_list(lower, upper)` function for the range of values given as arguments 
2. right_list_var = a list containing the output of the `right_list(lower, upper)` function for the range of values given as arguments 
3. string = string to be printed
4. result = list comprehension to print a list of lists without brackets

If the length of the list returned from `wrong_list(lowest_number, highest_number)` is greater than 1 the concatenated *string* and *result* variables will be printed. Else, if, the length of the list returned from `right_list(lowest_number, highest_number)` is greater than 1 a formatted string that confirms the Collatz conjecture for the given range of values to be tested is printed.

In [5]:
def result(lowest_number, highest_number):
    wrong_list_var = wrong_list(lowest_number, highest_number)
    right_list_var = right_list(lowest_number, highest_number)
    string = 'The numbers that Collatz Conjecture does not hold true for are: '
    result = ', '.join(str(item) for item in wrong_list_var)
    if len(wrong_list_var) > 0:
        print(string + result)
    elif len(right_list_var) > 0:
        print(
            f'The Collatz Conjecture holds true for the numbers from {lowest_number} up to and including {highest_number-1}.')

The `result(lowest_number, highest number)` function output is demonstrated below for the first 10,000 positive integers.

In [6]:
result(1,10001)

The Collatz Conjecture holds true for the numbers from 1 up to and including 10000.


### Make the Script Re-useable

Importing the sys library will allow for better re-useability of the script. The script can be run from the command line with the range of values to be tested passed as arguments from the command line.

In [7]:
import sys

The following variables are defined and casted to integers:
1. low_number = sys.argv[1]. sys.argv[1] is the first argument passed from the command line. This should be the start of the range to test. The risk is that the user enters a negative value, a floating point integer or a string.  
2. high_number = sys.argv[2] + 1. sys.argv[2] is the second argument passed from the command line. This should be the end of the range to be test. 1 is added to this value. The risk is that the user enters a negative value, a floating point integer, a string or a value that is lower than sys.argv[1].

To mitigate against the risks outlined above a combination of error handling, *if*, *elif* and *else* statements are used. The code contained in the *try* block will run unless a *ValueError* or an *IndexError* are caught. If this happens instructions are printed advising the user of the corrective action to take on re-run of the script. If no error is caught the `result(low_number, high_number)` function will test the range of values given as command line arguments only if the *low_number* value is a positive integer that is less than the *high_number* variable.

In [8]:
try:
    low_number = int(sys.argv[1]) # cast to int
    high_number = int(sys.argv[2]) + 1 # need to add 1 for desired range
    if low_number > 0 and low_number < high_number:
        result(low_number, high_number)
    elif low_number > 0 and high_number < low_number:
        print(f'{high_number} is lower than {low_number}. The script requires that the second argument is greater than'
              f' the first. Please run the script again with the correct parameters.')
    else:
        print(f'{low_number} is not a positive integer. Please run the script again and enter a positive integer as '
              f'an argument.')
except ValueError: # error handling
    print(f'Please enter a positive integer as an argument.')
except IndexError: # error handling
    print('Please enter two command line arguments. The arguments must be positive integers with the first number '
          f'less than the second number.')

Please enter a positive integer as an argument.


In the execution of the code above error handling is demonstrated. As the *low_number* variable is not a positive integer the *ValueError* code block is executed.  

### Task 1 References

Educative: Interactive Courses for Software Developers. (n.d.). How to generate the Collatz sequence in Python. [online] Available at: https://www.educative.io/answers/how-to-generate-the-collatz-sequence-in-python. [accessed 06 Oct. 2023].


### Task 1 Background Reading

bobbyhadz.com. (n.d.). Print a List without the Commas and Brackets in Python | bobbyhadz. [online] Available at: https://bobbyhadz.com/blog/python-print-list-without-commas-and-brackets [Accessed 6 Oct. 2023].

Chaudhuri, D.A.K. (2020). Collatz Conjecture-the simplest impossible problem. [online] Cooking Cosmos. Available at: https://asischaudhuri.wordpress.com/2020/11/09/collatz-conjecture/ [Accessed 6 Oct. 2023].

www.w3schools.com. (n.d.). Python Try Except. [online] Available at: https://www.w3schools.com/python/python_try_except.asp. [Accessed 6 Oct. 2023].

 