# Description

This notebook tests the PyGithub package to read a GitHub repository containing a Manubot-based manuscript.

# Modules

In [1]:
from github import Auth, Github
from IPython.display import display
from proj import conf

# Settings/paths

In [2]:
REPO = "pivlab/manubot-ai-editor-code-test-ccc-manuscript"
# PR 2: gpt-3.5-turbo
# PR 3: gpt-4-0125-preview
PR = 2

# Get Repo

In [3]:
auth = Auth.Token(conf.github.API_TOKEN)

In [4]:
g = Github(auth=auth)

In [5]:
repo = g.get_repo(REPO)

# Get Pull Request

In [6]:
pr = repo.get_pull(PR)

In [7]:
list(pr.get_files())

[File(sha="4b1b8489a63dc51e4eba4d71f32b026169e901d1", filename="content/01.abstract.md"),
 File(sha="96b9d8fdf8b314d5156029ea5e5c0727408d6cee", filename="content/02.introduction.md"),
 File(sha="ad101146fd6476ed4bf1fd4bcbbe8fa9e174c2f0", filename="content/04.05.results_intro.md"),
 File(sha="74ea03690a400f1c576007aff728e41fc7fb0faf", filename="content/04.10.results_comp.md"),
 File(sha="5088c6fa5f7d04ffdf17fd2392cc27c9d917c7a0", filename="content/04.12.results_giant.md"),
 File(sha="e0be1fecd6158445a8102f098545b135abf07fed", filename="content/06.discussion.md"),
 File(sha="328b3aac9742ec34d4d0d0fcc42dca439952433d", filename="content/08.01.methods.ccc.md"),
 File(sha="0c5625cac78916c535fe5f1c404846e52c62b59a", filename="content/08.05.methods.data.md"),
 File(sha="16d0a39afea2c2c361967d01214f79d8bbf2d76d", filename="content/08.15.methods.giant.md"),
 File(sha="81b2c0d4c4fca81a3820bd2ac3bb46efdd7b2ca5", filename="content/08.20.methods.mic.md"),
 File(sha="176e79ecda4017d56f4808a244e70a278

In [8]:
pr_commits = list(pr.get_commits())

In [9]:
pr_commits[0].parents

[Commit(sha="0adeb9d709cc9d66e52a325c114605655b1b4923")]

In [10]:
pr_prev = pr_commits[0].parents[0].sha
print(pr_prev)

0adeb9d709cc9d66e52a325c114605655b1b4923


In [11]:
pr_curr = pr_commits[0].sha
print(pr_curr)

bdee3d136aa9e8b6d80b31e926069f9b96e1cac5


# Get file list

In [12]:
pr_files = [f for f in pr.get_files() if f.filename.endswith(".md")]
display(pr_files)

[File(sha="4b1b8489a63dc51e4eba4d71f32b026169e901d1", filename="content/01.abstract.md"),
 File(sha="96b9d8fdf8b314d5156029ea5e5c0727408d6cee", filename="content/02.introduction.md"),
 File(sha="ad101146fd6476ed4bf1fd4bcbbe8fa9e174c2f0", filename="content/04.05.results_intro.md"),
 File(sha="74ea03690a400f1c576007aff728e41fc7fb0faf", filename="content/04.10.results_comp.md"),
 File(sha="5088c6fa5f7d04ffdf17fd2392cc27c9d917c7a0", filename="content/04.12.results_giant.md"),
 File(sha="e0be1fecd6158445a8102f098545b135abf07fed", filename="content/06.discussion.md"),
 File(sha="328b3aac9742ec34d4d0d0fcc42dca439952433d", filename="content/08.01.methods.ccc.md"),
 File(sha="0c5625cac78916c535fe5f1c404846e52c62b59a", filename="content/08.05.methods.data.md"),
 File(sha="16d0a39afea2c2c361967d01214f79d8bbf2d76d", filename="content/08.15.methods.giant.md"),
 File(sha="81b2c0d4c4fca81a3820bd2ac3bb46efdd7b2ca5", filename="content/08.20.methods.mic.md"),
 File(sha="176e79ecda4017d56f4808a244e70a278

# Get file content

In [13]:
pr_filename = pr_files[-1].filename
display(pr_filename)

'content/20.00.supplementary_material.md'

In [14]:
print(repo.get_contents(pr_filename, pr_prev).decoded_content.decode("utf-8"))

## Supplementary material {.page_break_before}

### Supplementary Note 1: Comparison with the Maximal Information Coefficient (MIC) on gene expression data {#sec:mic}

We compared all the coefficients in this study with MIC [@pmid:22174245], a popular nonlinear method that can find complex relationships in data, although very computationally intensive [@doi:10.1098/rsos.201424].
We ran MIC<sub>e</sub> (see Methods) on all possible pairwise comparisons of our 5,000 highly variable genes from whole blood in GTEx v8.
This took 4 days and 19 hours to finish (compared with 9 hours for CCC).
Then we performed the analysis on the distribution of coefficients (the same as in the main text), shown in Figure @fig:dist_coefs_mic.
We verified that CCC and MIC behave similarly in this dataset, with essentially the same distribution but only shifted.
Figure @fig:dist_coefs_mic c shows that these two coefficients relate almost linearly, and both compare very similarly with Pearson and Spearman.

![
*

In [15]:
print(repo.get_contents(pr_filename, pr_curr).decoded_content.decode("utf-8"))

## Supplementary material {.page_break_before}

### Supplementary Note 1: Comparison with the Maximal Information Coefficient (MIC) on gene expression data {#sec:mic}

We compared the coefficients from our study with the MIC method, which is known for identifying complex relationships in data but is computationally intensive.
We applied MICe to all possible pairwise comparisons of 5,000 highly variable genes from whole blood in GTEx v8.
This process took 4 days and 19 hours to complete, significantly longer than the 9 hours required for CCC.
The analysis of coefficient distribution, as described in the main text, is shown in Figure 1.
We found that CCC and MIC exhibited similar behavior in this dataset, with comparable distributions that were only slightly shifted.
Figure 1c illustrates a nearly linear relationship between these two coefficients, which also showed similarities with Pearson and Spearman correlations.

![
**Distribution of MIC values on gene expression (GTEx v8, whole bl

# Close connections

In [16]:
g.close()