<h1>1. Square Root 2</h1>

Task: Write a Python function called sqrt2 that calculates and prints to the screen the square root of 2 to 100 decimal places. Your code should not depend on any module from the standard library or otherwise. You should research the task first and include references and a description of your algorithm.

The square root of 2 to 100 decimal places is

1.4142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727 

(according to https://catonmat.net/tools/generate-sqrt2-digits and https://apod.nasa.gov/htmltest/gifcity/sqrt2.1mil)

There are several methods in Python that can be used to calculate a square root, but these need to be imported from modules such as *math* or *numpy*. However, the square root of 2 can also be calculated as the one-half power of 2 (https://en.wikipedia.org/wiki/Square_root_of_2) since

(x<sup>1/2</sup>) * (x<sup>1/2</sup>) = x<sup>(1/2 + 1/2)</sup> = x<sup>1</sup> (http://mathforum.org/library/drmath/view/65402.html)

Thus it can be simply calculated as follows:

In [3]:
# Calculating square root 2 with half exponent
exp_half = 2 ** .5
exp_half

1.4142135623730951

This gives the square root of 2 without using any imported libraries but the returned value doesn't satisfy the requirement of 100 decimal places and the final 16th digit is a 1 rather than the expected 0. Perhaps this could simply be reformatted to display 100 decimal places.  

In [4]:
# Formatting to 100 decimal places
"{:.100f}".format(exp_half)

'1.4142135623730951454746218587388284504413604736328125000000000000000000000000000000000000000000000000'

While more decimal places are returned here, they appear to become incorrect after the 15<sup>th</sup> digit. This is due to the difficulty of computing floating point numbers, compounded by the fact that root 2 is an irrational number, seemingly infinite with no repeating pattern. 

Binary expresses real numbers in base 2 and so decimal fractions are expressed as base 2 fractions. If the denominator of the decimal fraction to be expressed is a power of 2, they can be represented accurately (https://stackoverflow.com/a/588014) but other numbers can only be approximated and rounding errors are inherent to their calculation  (https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html). 

The example below demonstrates these errors. While 1.5 - 1.0 == 0.5 returns as True, 1.2 - 1.0 == 0.2 returns as False. This is because 0.5 = 1/2 so the denominator can be expressed as a power of 2 but 0.2 = 1/5, which cannot be represented as a power of 2 and so this number can only be approximated. The estimation appears accurate at first but as the number of values after the decimal point increases, the precision decreases. The same thing happened in the calculation above - the values are correct up to the 15<sup>th</sup> digit but then become inaccurate.

In [37]:
# Demonstrating floating point number calculations
print("Is 1.5 - 1.0 == 0.5?")
print(1.5 - 1.0 == 0.5)
print("")

print("Is 1.2 - 1.0 == 0.2?")
print(1.2 - 1.0 == 0.2)
print("")

print("What is 1.2 - 1.0?")
print(1.2 - 1.0)
print("")

print("Expressing 1.5 and 1.2 to 1 decimal place:")
print(1.5)
print(1.2)
print("")

print("Expressing 1.5 and 1.2 to 10 decimal places:")
print("{:.10f}".format(1.5))
print("{:.10f}".format(1.2))
print("")

print("Expressing 1.5 and 1.2 to 20 decimal placs:")
print("{:.20f}".format(1.5))
print("{:.20f}".format(1.2))

Is 1.5 - 1.0 == 0.5?
True

Is 1.2 - 1.0 == 0.2?
False

What is 1.2 - 1.0?
0.19999999999999996

Expressing 1.5 and 1.2 to 1 decimal place:
1.5
1.2

Expressing 1.5 and 1.2 to 10 decimal places:
1.5000000000
1.2000000000

Expressing 1.5 and 1.2 to 20 decimal placs:
1.50000000000000000000
1.19999999999999995559


While the previous calculation of root 2 became inaccurate after a certain number of values, it looks like the program stopped calculating values altogether and returned only zeros after the 52<sup>nd</sup> digit.

This is because there is a finite number of bits in which to store floating point numbers. Most machines today store floats in 53 bits, which is why the answer above ends with these trailing zeros. https://docs.python.org/3.4/tutorial/floatingpoint.html. It is not possible to store an irrational number such as root 2 in bits as it would require storing something infinite into a finite amount of space.   

Since the issues arising here seem to come from the computational difficulty in calculating a floating point number, perhaps presenting the number 2 as a very large integer (2 with a lot of zeros) will bypass the issue. The returned value needs to have 100 decimal places so 200 zeros are tacked on to 2 and the square root of this number is calculated. Below uses the same method previously utilised to calculate the square root with a 0.5 exponent.

In [5]:
# add 200 zeros to 2 to get integer square root 
large_int = 2* 10**200 
exp_int = large_int ** .5

exp_int

1.414213562373095e+100

The answer is so large that it's returned as an exponent. The number is formatted below to view it in its entirety, this time without specifying the number of decimal places as is should show an integer.  

In [7]:
# Format to view number without exponent
'{:f}'.format(exp_int)

'14142135623730950271424125632818586983491648817919875481779003888601306842716543030228210043498528768.000000'

While this method appears to return the required number of digits, albeit in integer form, the values again become incorrect after the 15th digit. Because this method multiplies the large integer representation of 2 by a floating point number, the problems that come with computing floats again came into play. 

Figure out Newton Method of calculating square root.
Note example below different from the 2**.5 above. According to wiki the first few digits are https://catonmat.net/tools/generate-sqrt2-digits root 2 to 100 digits is 1.4142135623730950, which makes the above exponent calculation more accurate than the code below. 

In [8]:
# https://hackernoon.com/calculating-the-square-root-of-a-number-using-the-newton-raphson-method-a-how-to-guide-yr4e32zo
def mySqrt(x):

    r = x
    precision = 10 ** (-10)
    
    while abs(x - r * r) > precision:
        r = (r + x / r) / 2
        
    return r

mySqrt(2)

1.4142135623746899

Another method to overcome the floating point number limitations is to perform the calculation digit-by-digit
https://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Digit-by-digit_calculation


Don't really understand above so trying to follow the algorithm section here: https://www.homeschoolmath.net/teaching/square-root-algorithm.php with help from https://www.youtube.com/watch?v=50oan6UYrX4&feature=emb_logo&ab_channel=SchoolMathsByThomasAnthikat
 
- This method uses number pairs to solve the square root so 2 would need to be written as 02.00000 (number of zeroes after decimal depends on how many decimal places you want in the answer. Each number pair represents a single digit in the solution
    - so 02.00 00 00 would give 4 digits of the solution (1.414)
    - this is why the code above has 2 * digits
    - it is multiplied by 10 to avoid working with floats

Is this using the long quadratic root calculation? https://www.coursera.org/lecture/progfun1/lecture-1-5-example-square-roots-with-newtons-method-FQDE1 - says it's good for manual calculations but not for computing


The function below (taken from https://stackoverflow.com/a/5189881) works by first multiplying the number by 10^2(no. of decimal places required) and then getting the integer square root of that number


// floor division - rounds down to nearest integer

/>> bitwise right shift - shifts the bits of the first number by the number of places dictated by the second number. Dividing by 2 instead gives the same answer https://stackoverflow.com/a/8646495

x>>y - x with the bits moved y places

Apparently it is possible to use bits to get square root
https://stackoverflow.com/questions/3174666/finding-the-square-root-of-a-given-number-using-bitwise-operations
https://en.wikipedia.org/wiki/Integer_square_root#Using_bitwise_operations

In [9]:
def sqrt2(a, digits):
    a = a * (10**(2*digits))  # integer value for 2 with 200 zeroes (square root of which should be 100 zeroes)
    x_prev = 0                # holds previous estimate when x_next changes in next loop
    x_next = 1 * (10**digits) # square estimate
    while x_prev != x_next:   # loop ends when the estimates are equal 
        x_prev = x_next 
        x_next = (x_prev + (a // x_prev)) // 2 #previously bitwise but same as as floor dividing by 2
    return x_next

x = sqrt2(2, 100)
print(x)

len(str(x)) # 101 including the number before the decimal


14142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727


101

This is what the code above does:

Newton's method uses successive approximations - explained here https://www.coursera.org/lecture/progfun1/lecture-1-5-example-square-roots-with-newtons-method-FQDE1

So for root 2:

Start with initial estimate 1 (y)
Repeatedly improve the estimate by getting the mean value of y and x/y

Estimate = 1
Quotient: 2/1 = 2
Mean: (2+1)/2 = 1.5
Estimate: 1.5
Quotient: 2/1.5 = 1.3333
Mean: (1.3333 + 1.5)/2 = 1.4617
Estimate: 1.4617
Quotient: 2/1.4617 = 1.4142
Mean: 1.4142
etc...

Will this also be limited by floating point number issues? Use integer to avoid it?


Manually working through code 

def sqroot(2, 4):
    a = a * (10**(2*digits))                        a = 200000000   
    x_prev = 0                                      x_prev = 0              
    x_next = 1 * (10**digits)                       x_next = 10000          
    while x_prev != x_next:             
        x_prev = x_next                             x_prev = 10000   x_prev = 15000   x_prev = 14166   x_prev = 14142
        x_next = (x_prev + (a // x_prev)) >> 1      x_next = 15000   x_next = 14166   x_next = 14142   x_next = 14142
    return x_next


From Ian's video:
math.sqrt
https://docs.python.org/3/tutorial/floatingpoint.html
numpy
np.sqrt()
matplotlib.pyplot
https://tour.golang.org/flowcontrol/8 Newtons method
https://www.mathjax.org/ - built in to jupyter notebook (LaTeX math symbols)

<h1>2. Chi Square Test for Independence</h1>

Task: The Chi-squared test for independence is a statistical hypothesis test like a t-test. It is used to analyse whether two categorical variables are independent. The Wikipedia article gives the table below as an example, stating the Chi-squared value based on it is approximately 24.6. Use scipy.stats to verify this value and calculate the associated p value. You should include a short note with references justifying your analysis in a markdown cell.

|   |A   |B   |C   |D   |Total   |
|---|---|---|---|---|---|
|White Collar   |90   |60   |104   |95   |349   |
|Blue Collar   |30   |50   |51   |20   |151   |
|No Collar   |30   |40   |45   |35   |150   |
|Total   |150   |150   |200   |150   |650   |