## Objective 
The objective of this assignment is to implement a Python program using object-oriented programming principles to perform gene expression analysis on liver cancer data.


## Background
Hepatocellular carcinoma (HCC) (liver cancer) is one of the most common malignant solid tumors and the fourth leading cause of cancer-related deaths worldwide[1]. One promising avenue for gaining insights into the molecular mechanisms underlying liver cancer and potential diagnostic or therapeutic targets is the analysis of gene expression data. Gene expression analysis allows us to explore how genes are turned on or off in liver tissues affected by cancer compared to healthy tissues. Understanding these alterations in gene expression patterns can shed light on the biological processes and pathways involved in the development and progression of liver cancer.[2]


## Data
Download the liver cancer data `GSE 14520_U133_2.csv` from the website https://sbcb.inf.ufrgs.br/cumida [3]. It contains 22278 genes from 357 samples. **Mind you** that the samples are in the rows and the genes in the columns; keep this in mind while structuring the data in the Expression Data Class.



# Assignment

## Create an Expression Data Class

Implement a `GeneExpressionData` class that will be used to represent and manipulate liver cancer gene expression data. The class should have the following methods:
- `__init__(self, file_path)`: Initialize the class with a file path to liver cancer gene expression data

- `load_data(self)`: Read and load the gene expression data from the file into a suitable data structure (e.g., a list of lists, a dictionary, or other suitable data structure).

- `get_gene_names(self)`: Returns a list of gene names from the data.

- `get_expression(self, gene_name)`: Given a gene name, return its expression values across different samples.

## Create a statistical analysis class

Create a `StatisticalAnalysis` class that will perform statistical analysis on the liver cancer gene expression data. The class should include at least the following methods:

- `__init__(self, gene_expression_data):` Initialize the class with an instance of the GeneExpressionData class.
- `calculate_mean_expression(self, gene):` Calculate and return the mean expression of a specific gene across all samples.
- `calculate_differential_expression(self, gene_1, gene_2):` Calculate the differential expression between two genes and return the result.

You can extend class this with other useful methods. 


## Write the main program

Create a main python program that demonstrates the use of the classes you've created to perform gene expression analysis on liver cancer data. Use a seperate module for the classes. The program should also generate and save the written reports with results. It should fetch arguments from the command line. Make sure that you explain your design choices properly using an argumentative approach. 


## Assessment criteria
Your submission will be assessed with the assessment rubric. 
Full point will be assigned when you provide demonstrated proof that you
- Can use variables. Can use simple datatypes (str, int, float, bool), nested datatypes and collections(list, tuple, dict, set) and operate on these. Can given an algorithm use the appropriate datatype. Can use casting to convert between datatypes.
Can use datatypes from modules
- Develop a deep understanding of conditional statements (if-else, list comprehensions, try-except, generators) and loops (for, while). Utilize flow control to implement algorithms and solve real-world problems in efficient. Can use if/elif/else conditions. Can use for loop on collections. Can use boolean arithmatics. Can use for/while else.
- Demonstrate proficiency in function scope, local and global variables, and recursion. Apply functions to solve problems effectively, promoting code readability and maintainability. Can use args/kwargs
- Define classes and objects, understanding the principles of object-oriented programming. Implement class constructors, methods, and properties. Uses inheritance
- Understand the concept of modular programming and the advantages of using modules. Organize code into separate modules to promote code reusability and maintainability. Import and use external modules and packages effectively. Code is organized logical in modules. Makes use of if __name__ == ‘__main__”. Imports are placed at the top level. Can combine modules into packages
- Can read input from various sources.  Uses string formatting using f’ strings to output structured tekst. Parses data in streaming mode. Implement robust input validation mechanisms to ensure data integrity. Can read/write to network connections
- Writes a README and/or help in which the program is explained. Writes pydoc on the module, class, method and function level. Utilize docstrings and annotations to provide comprehensive documentation for functions and classes. Adheres to the PEP8 styling rules. Variable and function names are describing support the structure
- Used inline comments to explain desisions in code. Uses argumentative approach in defending  decisions during code reviews. Engages in discussions and debates about the benefits and drawbacks of specific flow control patterns in software. Engage in discussions about the advantages of code modularity, reusability, and maintainability facilitated by modular programming.  Demonstrate a critical understanding of security considerations related to input/output operations and make informed design choices accordingl


### references

[1] Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.

[2] Ashekyan O, Shahbazyan N, Bareghamyan Y, Kudryavzeva A, Mandel D, Schmidt M, Loeffler-Wirth H, Uduman M, Chand D, Underwood D, Armen G, Arakelyan A, Nersisyan L, Binder H. Transcriptomic Maps of Colorectal Liver Metastasis: Machine Learning of Gene Activation Patterns and Epigenetic Trajectories in Support of Precision Medicine. Cancers (Basel). 2023 Jul 28;15(15):3835. doi: 10.3390/cancers15153835. PMID: 37568651; PMCID: PMC10417131.
Copy

[3] Feltes, B.C.; Chandelier, E.B.; Grisci, B.I.; Dorn, M. CuMiDa: An Extensively Curated Microarray Database for Benchmarking and Testing of Machine Learning Approaches in Cancer Research. Journal of Computational Biology, 2019.
