In [None]:
%matplotlib inline

from js import fetch

async def get_csv(url):
    res = await fetch(url)
    text = await res.text()
    filename = 'data.csv'
    with open(filename, 'w') as f:
        f.write(text)

# Group 1 Research Project

## Research Question

How does the use of standard HLA-B star allele genes(*57/01) testing for patients receiving 
ABACAVIR affect the outcome of HIV treatment?

## Hypothesis

The HLA-B specific star allele causes ABACAVIR treatment failure in HIV patients.

## Comment

You are investigating the association of HLA-B *57:01 carrier status with the occurrence of treatment failure in patients with HIV infection. 
This is an important area of research and is fairly mature from a clinical perspective; patients should receive testing for HLA-B*57:01 prior to receiving abacavir. 


### Suggestions

- Fully define "treatment failure" in your research paper; it should be straightforward since there is a very well characterized risk in patients who are positive for HLA-B*57:01 and receiving abacavir. 
- In your paper, describe the mechanistic relationship between the genotype - phenotype. 
    - Not just causes, but how?

## Dataset Provided

For your research question, I have provided a *simulated* dataset of 10,000 subjects who had a diagnosis of HIV infection and received abacavir. 
Your dataset contains the following data: 

- ID: A number to identify the subject within the dataset
- Sex: The subject's sex assigned at birth
- Age: The subject's age when abacavir was started 
- Treatment Failure: Whether or not the subject failed treatment with abacavir (note - remember to define this in your paper)

### Statistician Consult

Your dependent variable is treatment failure. 
Your independent variable is carrier status of HLA-B*57:01.
At the bare minimum, you need to compare the frequency of treatment failure stratified by genotype. 
This could be accomplished with a Chi-squared test. 
You can do that with something like this: 

```python
from scipy.stats import chi2_contingency

df = your_pandas_dataframe

## Make a contingency table ##

contigency= pd.crosstab(df['variable_1'], df['variable_1'])

chi_square, p_value, degrees_of_freedom, expected_frequencies = chi2_contingency(contigency)

print(p_value)
```

You could also make tables/vidualizations of your demographic values. 

Your dataset can be accessed with the following url: 

https://raw.githubusercontent.com/sadams-teaching/PGPM-503-ENV/main/data/projects/group1_data.csv