# Project Description

This project preforms a chi square test. The user can choose if they want to analyze a hardy weinberg equilibrium population or a mendelian population through the choose_chi_square function. Depending on their choice, either the hardy_weinerg_equilibrium or the mendelian function will be called. Each of which will recieve the observed amounts of each phenotype in the population and print out the degrees of freedom and chi square value based on those amounts. In order to solve for a chi square value, an expected amount of each phenotype will be calculated through either the hwe_expected_calculator or the mendelian_expected_calculator function, depending on which type of analysis was chosen in the beginning. These functions will return a list of expected values for each phenotype. This list, along with a list created with the inputted observed values are sent to the chi_square_calc function which calculates for the chi square value. The p_value_eval function determines a p-value based on the chi square value and degrees of freedom, and evaluates if the null hypothesis can be rejected or not. 

## Project Code

If it makes sense for your project, you can have code and outputs here in the notebook as well.

In [1]:
from my_module.functions import my_func, my_other_func

In [2]:
# Do a bunch of things.

In [12]:
def choose_chi_square(type_chi_square):
    
    """"Choose what type of chi square analysis you want to preform, hardy weinberg equilibrium or mendelian.
    
    Parameters
    ------------
    type_chi_square : str
        Type of analysis that will be done.
    
    Returns
    ---------
    
    """
    
    # Makes lower_type_chi_square input lowercase.
    lower_type_chi_square = type_chi_square.lower()
    
    # If input is "hardy weinberg equilibrium", it will ask the user the appropriate values of each phenotype.
    if lower_type_chi_square == "hardy weinberg equilibrium":
        # Prints out the null hypothesis.
        print("The population is in Hardy Weinberg Equilibrium. Any deviation can be attributed to chance.")
        homo_recessive = int(input("Amount of homogenous recessive individuals observed in the population: ")) 
        hetero = int(input("Amount of heterogenous individuals observed in the population: " ))
        homo_dominant = int(input("Amount of homogenous dominant individuals observed in the population: " ))
        # Calls the hardy_weinerg_equilibrium function to finish analyzing the chi square test for a hardy weinerg population.
        hardy_weinerg_equilibrium(homo_recessive, hetero, homo_dominant)
    # If input is "hwe", it will ask the user the appropriate values of each phenotype.
    elif lower_type_chi_square == 'hwe':
        # Prints out the null hypothesis.
        print("The population is in Hardy Weinberg Equilibrium. Any deviation can be attributed to chance.")  
        homo_recessive = int(input("Amount of homogenous recessive individuals observed in the population: "))  
        hetero = int(input("Amount of heterogenous individuals observed in the population: " ))
        homo_dominant = int(input("Amount of homogenous dominant individuals observed in the population: " ))
        # Calls the hardy_weinerg_equilibrium function to finish analyzing the chi square test for a hardy weinerg population.
        hardy_weinerg_equilibrium(homo_recessive, hetero, homo_dominant)
    # If input is "mendelian", it will ask the user the appropriate values of each phenotype.
    elif lower_type_chi_square == "mendelian":
        # Prints out the null hypothesis.
        print("There is no real differance between observed values and expected values. Any deviation can be attributed to chance.")
        dom1dom2_phen = input("What is the phenotype with both dominant alleles? ")
        dom1dom2 = int(input("Amount of individuals in the population with this phenotype: "))
        dom1res2_phen = input("What is the phenotype with the first allele being dominant and the second being recessive? ")
        dom1res2 = int(input("Amount of individuals in the population with this phenotype: "))
        res1dom2_phen = input("What is the phenotype with the first allele being recessive and the second being dominant? ")
        res1dom2 = int(input("Amount of individuals in the population with this phenotype: "))
        res1res2_phen = input("What is the phenotype with both recessive alleles? ")
        res1res2 = int(input("Amount of individuals in the population with this phenotype: "))
        # Calls the mendelian function to finish analyzing the chi square test for a mendelian population.
        mendelian(dom1dom2, dom1res2, res1dom2, res1res2)
    # If input is any other string, then a value error will e raised.
    else:
        print("That is not an option.")

In [13]:
def hardy_weinerg_equilibrium(homo_recessive, hetero, homo_dominant):
    
    """"basic outline of a hardy weinerg equilibrium chi square test; degrees of freedom, observed values, expected values, chi square value and p-value. 
    
    Parameters
    ------------
    homo_recessive : int
        Observed amount of homogenous revessive individuals in the population.
    hetero : int
        Observed amount of heterogenous individuals in the population.
    homo_dominant : int
        Observed amount of homogenous dominant individuals in the population.
        
    Returns
    ---------
    
    """
    
    # The degrees of freedom for a hardy weinberg equilibrium analysis is always 1.
    degrees_of_freedom = 1
    print("Degrees of Freedom: " + str(degrees_of_freedom)) 
    # Puts the inputted observed values into a list. 
    observed_list = [homo_recessive, hetero, homo_dominant]
    # Calls the hwe_expected_calculator function to return a list of expected values.
    expected_list = hwe_expected_calculator(homo_recessive, hetero, homo_dominant)
    # Calls the chi_square_calc function to return the chi square value.
    chi_square = chi_square_calc(observed_list, expected_list)
    print("Chi Square value calculated: " + str(chi_square))
    # Calls the p_value_eval function to return the p-value ad the coclusion of the chi square test.
    print(p_value_eval(chi_square, degrees_of_freedom))
    

In [14]:
def hwe_expected_calculator(homo_recessive, hetero, homo_dominant): 
    
    """"Calculates the expected values of each phenotype from the population.
    
    Parameters
    ------------
    homo_recessive : int
        Observed amount of homogenous revessive individuals in the population.
    hetero : int
        Observed amount of heterogenous individuals in the population.
    homo_dominant : int
        Observed amount of homogenous dominant individuals in the population.
        
    Returns
    ---------
    expected_list : list
        List of calculated expected values.
        
    """
    
    # Calculates the recessive and dominant allele frequencies in the population.
    freq_allele_homo_recessive = (2 * homo_recessive + hetero) / (2 * homo_recessive + 2 * hetero + 2 * homo_dominant)
    freq_allele_homo_dominant = (2 * homo_dominant + hetero) / (2 * homo_recessive + 2 * hetero + 2 * homo_dominant)
    
    # Calculates the phenotype frequencies in the population.
    freq_homo_recessive = freq_allele_homo_recessive ** 2
    freq_hetero = 2 * freq_allele_homo_recessive * freq_allele_homo_dominant
    freq_homo_dominant = freq_allele_homo_dominant ** 2
    
    # Calculates the expected amount of individuals for each phenotype in the population.
    expected_homo_recessive = freq_homo_recessive * (homo_recessive + hetero + homo_dominant)
    expected_hetero = freq_hetero * (homo_recessive + hetero + homo_dominant)
    expected_homo_dominant = freq_homo_dominant * (homo_recessive + hetero + homo_dominant)
    
    # Puts the calculated expected amounts in a list.
    expected_list = [expected_homo_recessive , expected_hetero , expected_homo_dominant]
    
    return expected_list

In [15]:
def mendelian(dom1dom2, dom1res2, res1dom2, res1res2):
    
    """"basic outline of a hardy weinerg equilibrium chi square test; degrees of freedom, observed values, expected values, chi square value and p-value. 
    
    Parameters
    ------------
    dom1dom2 : int 
        Observed amount of individuals in the population with both dominant alleles.
    dom1res2 : int
        Observed amount of individuals in the population who have the first dominant allele and the second recessive allele.
    res1dom2 : int
        Observed amount of individuals in the population who have the first recessive allele and the second dominant allele.
    res1res2 : int
        Observed amount of individuals in the population with both recessice alleles.
    
    Returns
    ---------
    
    """
    
    # Puts the inputted observed values into a list. 
    observed_list = [dom1dom2, dom1res2, res1dom2, res1res2]
    # Calls the mendelian_expected_calculator function to return a list of expected values.
    expected_list = mendelian_expected_calculator(dom1dom2, dom1res2, res1dom2, res1res2)
    # Degrees of freedom for a mendelian analysis is one mius the numer of phenotypes present.
    degrees_of_freedom = len(observed_list) - 1
    print("Degrees of Freedom: " + str(degrees_of_freedom))
    # Calls the chi_square_calc function to return the chi square value.
    chi_square = chi_square_calc(observed_list, expected_list)
    print("Chi Square value calculated: " + str(chi_square))
    # Calls the p_value_eval function to return the p-value ad the coclusion of the chi square test.
    print(p_value_eval(chi_square, degrees_of_freedom))

In [16]:
def mendelian_expected_calculator(dom1dom2, dom1res2, res1dom2, res1res2):
    
    """"Calculates the expected values of each phenotype from the population.
    
    Parameters
    ------------
    dom1dom2 : int 
        Observed amount of individuals in the population with both dominant alleles.
    dom1res2 : int
        Observed amount of individuals in the population who have the first dominant allele and the second recessive allele.
    res1dom2 : int
        Observed amount of individuals in the population who have the first recessive allele and the second dominant allele.
    res1res2 : int
        Observed amount of individuals in the population with both recessice alleles.
    
    Returns
    ---------
    expected_list : list
        List of calculated expected values.
    
    """
    
    # Calculates the expected amount of individuals for each phenotype in the population.
    expected_dom1dom2 = (9/16) * (dom1dom2 + dom1res2 + res1dom2 + res1res2)
    expected_dom1res2 = (3/16) * (dom1dom2 + dom1res2 + res1dom2 + res1res2)
    expected_res1dom2 = (3/16) * (dom1dom2 + dom1res2 + res1dom2 + res1res2)
    expected_res1res2 = (1/16) * (dom1dom2 + dom1res2 + res1dom2 + res1res2)
    
    # Puts the calculated expected amounts in a list.
    expected_list = [expected_dom1dom2, expected_dom1res2, expected_res1dom2, expected_res1res2]
    
    return expected_list

In [17]:
def chi_square_calc(observed_list, expected_list):
    """"Calculates the chi square value.
    
    Parameters
    -----------
    observed_list : list
        List of observed values.
    expected_list : list
        List of expected values.
        
    Returns
    ---------
    chi_square_value : int
        Calculated chi square value.
        
    """
    
    # Initializes the variable chi_square to equal to 0.
    chi_square = 0
    # Runs observed and expected list together in a loop to get a chi square value.
    for observed_value, expected_value in zip(observed_list, expected_list):
        # Calculates the chi square value.
        chi_square_value = (((observed_value - expected_value) ** 2)  / expected_value) + chi_square
    
    return chi_square_value

In [18]:
def p_value_eval(chi_square, degrees_of_freedom):
    """Determines a p-value and evaluates whether the null hypothesis will be rejected or not.
    
    Parameters
    -----------
    chi_square : int
        Calculated chi square value.
    degrees_of_freedom : int
        number of independent values in the population.
        
    Returns
    --------
    p_value : str
        The probability that the expected and observed values are similar and its conclusion.
    
    """
    
    # Presents the p-value options and their results for differet values of degrees_of_freedom.
    if degrees_of_freedom == 1:
        # based on the chi_square value, different p-values will be printed.
        if chi_square < 3.84 and chi_square >= 2.71:
            p_value = "p-value is about 0.10, therefore the null hypothesis is not rejected."
        elif chi_square < 2.71 and chi_square >= 1.64:
             p_value = "p-value is about 0.20, therefore the null hypothesis is not rejected."
        elif chi_square < 1.64 and chi_square >= 1.07:
            p_value = "p-value is about 0.30, therefore the null hypothesis is not rejected."
        elif chi_square < 1.07 and chi_square >= 0.46:
            p_value = "p-value is about 0.50, therefore the null hypothesis is not rejected."
        elif chi_square < 0.46 and chi_square >= 0.15:
            p_value = "p-value is about 0.70, therefore the null hypothesis is not rejected."
        elif chi_square < 0.15 and chi_square >= 0.06:
            p_value = "p-value is about 0.80, therefore the null hypothesis is not rejected."
        elif chi_square < 0.06 and chi_square >= 0.02:
            p_value = "p-value is about 0.90, therefore the null hypothesis is not rejected."
        elif chi_square < 0.02:
            p_value = "p-value is about 0.95, therefore the null hypothesis is not rejected."
        else:
            p_value = "p-value is less than 0.05, therefore there is enough evidence to reject the null hypothesis."
    elif degrees_of_freedom == 2:
        if chi_square < 5.99 and chi_square >= 4.60:
            p_value = "p-value is about 0.10, therefore the null hypothesis is not rejected."
        elif chi_square < 4.60 and chi_square >= 3.22:
             p_value = "p-value is about 0.20, therefore the null hypothesis is not rejected."
        elif chi_square < 3.22 and chi_square >= 2.41:
            p_value = "p-value is about 0.30, therefore the null hypothesis is not rejected."
        elif chi_square < 2.41 and chi_square >= 1.39:
            p_value = "p-value is about 0.50, therefore the null hypothesis is not rejected."
        elif chi_square < 1.39 and chi_square >= 0.71:
            p_value = "p-value is about 0.70, therefore the null hypothesis is not rejected."
        elif chi_square < 0.71 and chi_square >= 0.45:
            p_value = "p-value is about 0.80, therefore the null hypothesis is not rejected."
        elif chi_square < 0.45 and chi_square >= 0.21:
            p_value = "p-value is about 0.90, therefore the null hypothesis is not rejected."
        elif chi_square < 0.021:
            p_value = "p-value is about 0.95, therefore the null hypothesis is not rejected."
        else:
            p_value = "p-value is less than 0.05, therefore there is enough evidence to reject the null hypothesis."
    elif degrees_of_freedom == 3:
        if chi_square < 7.82 and chi_square >= 6.25:
            p_value = "p-value is about 0.10, therefore the null hypothesis is not rejected."
        elif chi_square < 6.25 and chi_square >= 4.64:
             p_value = "p-value is about 0.20, therefore the null hypothesis is not rejected."
        elif chi_square < 4.64 and chi_square >= 3.66:
            p_value = "p-value is about 0.30, therefore the null hypothesis is not rejected."
        elif chi_square < 3.66 and chi_square >= 2.37:
            p_value = "p-value is about 0.50, therefore the null hypothesis is not rejected."
        elif chi_square < 2.37 and chi_square >= 1.42:
            p_value = "p-value is about 0.70, therefore the null hypothesis is not rejected."
        elif chi_square < 1.42 and chi_square >= 1.01:
            p_value = "p-value is about 0.80, therefore the null hypothesis is not rejected."
        elif chi_square < 1.01 and chi_square >= 0.58:
            p_value = "p-value is about 0.90, therefore the null hypothesis is not rejected."
        elif chi_square < 0.58:
            p_value = "p-value is about 0.95, therefore the null hypothesis is not rejected."
        else:
            p_value = "p-value is less than 0.05, therefore there is enough evidence to reject the null hypothesis."
    elif degrees_of_freedom == 4:
        if chi_square < 9.49 and chi_square >= 7.78:
            p_value = "p-value is about 0.10, therefore the null hypothesis is not rejected."
        elif chi_square < 7.78 and chi_square >= 5.99:
             p_value = "p-value is about 0.20, therefore the null hypothesis is not rejected."
        elif chi_square < 5.99 and chi_square >= 4.88:
            p_value = "p-value is about 0.30, therefore the null hypothesis is not rejected."
        elif chi_square < 4.88 and chi_square >= 3.36:
            p_value = "p-value is about 0.50, therefore the null hypothesis is not rejected."
        elif chi_square < 3.36 and chi_square >= 2.20:
            p_value = "p-value is about 0.70, therefore the null hypothesis is not rejected."
        elif chi_square < 2.20 and chi_square >= 1.65:
            p_value = "p-value is about 0.80, therefore the null hypothesis is not rejected."
        elif chi_square < 1.65 and chi_square >= 1.06:
            p_value = "p-value is about 0.90, therefore the null hypothesis is not rejected."
        elif chi_square < 1.06:
            p_value = "p-value is about 0.95, therefore the null hypothesis is not rejected."
        else:
            p_value = "p-value is less than 0.05, therefore there is enough evidence to reject the null hypothesis."
    
    return p_value

In [19]:
# Checking that choose_chi_square works. 
type_chi_square = "jhfk"
choose_chi_square(type_chi_square)

That is not an option.


In [None]:
# This project is available on GitHub.
# Acounnt name: 