# 1 Annotating Your Code

## 1.1 Learning Objectives
<ul>
<li>Explain the significance and benefits of code annotation and how it aids in understanding and working with code in the future.</li>
<li>Demonstrate the ability to apply effective code annotation practices by providing clear and concise comments in code.</li>
<li>Recognize the impact of code annotation on code maintainability.</li>
<li>Explain how well-annotated code can facilitate collaboration among developers and ensure code consistency.</li>
<li>Critically evaluate the use of AI in code annotation, including benefits, ethical considerations, and limitations.</li>
<li>Explore how AI can be used to create README files, do line-by-line annotations, and offer potential code improvements.</li>
</ul>

## 1.2 Annotating Code is Good Practice
<p>Code annotation, also known as code commenting, is the process of adding explanatory notes to source code. These notes are used to provide context, clarify functionality, and aid in understanding for developers who may be working on the codebase in the future. Code annotation is an important practice for any developer looking to write clear, efficient, and maintainable code.</p>
<p>There are many benefits to annotating code:</p>
<ul>
<li>Improves readability</li>
<li>Improves maintainability</li>
<li>Improves quality</li>
</ul>

## 1.3 Improves readability
<p>First, it can help to improve the readability of code. By adding comments, developers can explain the purpose of each section of code, which can make it easier for others to understand what the code is doing. This can be especially helpful for large or complex codebases, where it can be difficult to keep track of all the different components. This is particularly important in contexts where the original developer of a project may move on to something else and others are left to work on them. Annotation perhaps most often helps a developer remember things they knew about the code when they were originally writing it (annotation is helpful for future you!) If the original developer left well-annotated code, it can drastically improve the ability of others (and their future selves) to continue with the project.</p>

### 1.3.1 Examples of readability aiding comments3>
<p>Comments that help readability <strong>clarify what the code is doing</strong></p>
<p><strong>Function explanations:</strong> A comment at the beginning of a function or method can describe its purpose, parameters, and expected return values. This makes it easier for others to understand what the function does and how to use it.</p>
<pre><code># This function calculates and returns the sum of two numbers (x and y)
def add_numbers(x, y):
    return x + y</code></pre>
<p><strong>Inline explanations:</strong> Use inline comments to explain what a specific line or block of code does. This can help someone reading your code to quickly understand what’s going on.</p>
<pre><code># Loop through each item in the list
for item in my_list:
    # Check if the item is greater than 10
    if item &gt; 10:
        # Print the item
        print(item)</code></pre>
<p><strong>TODO comments:</strong> Use TODO comments to mark areas of your code that need further work or improvement. This can help you or others to remember to come back to a specific part of the code.</p>
<pre><code># TODO: Implement error handling for this function
def my_function():
    pass</code></pre>
<p><strong>Parameter descriptions:</strong> If a function or method has complex parameters, it can be helpful to add comments explaining what each parameter does.</p>
<pre><code>def my_function(parameter1, parameter2):
    &quot;&quot;&quot;
    Calculate the sum of two numbers.

    Parameters:
    parameter1 (int): The first number to be added.
    parameter2 (int): The second number to be added.

    Returns:
    int: The sum of parameter1 and parameter2.
    &quot;&quot;&quot;
    return parameter1 + parameter2</code></pre>
<p><strong>Code block summary explanations:</strong> If you have a long or complex code block, you can add a comment to explain what the block is doing.</p>
<pre><code># This code block creates a dictionary containing the counts of each word in a list of text strings
word_counts = {}
for text in text_list:
    for word in text.split():
        if word not in word_counts:
            word_counts[word] = 0
        word_counts[word] += 1</code></pre>

## 1.4 Improves maintainability
<p>Code annotation can help to improve the maintainability of code. By adding comments, developers can explain the reasoning behind certain decisions, which can make it easier for others to make changes to the code without breaking it. This can be especially helpful when multiple developers are working on the same codebase, as it can help to prevent conflicts and ensure that the code is always in a consistent state.</p>


### 1.4.1 Examples of maintainability aiding comments
<p>Comments that help maintainability <strong>explain the historical context of why code was made the way it was</strong></p>
<p><strong>Examples of maintainability aiding comments</strong></p>
<p><strong>Design decisions:</strong> If there were specific design decisions made when creating the code, you can add comments explaining why certain choices were made.
bash</p>
<pre><code># We chose to use a linked list data structure for this function to reduce the time complexity of inserting and deleting elements.</code></pre>
<p><strong>Legacy code:</strong> Sometimes, code may have been written in a certain way due to constraints or limitations at the time it was created. Adding comments to explain this can help others understand why the code is the way it is.</p>
<pre><code># This code was written before Python 3, which introduced the `yield from` syntax. Therefore, we used a `for` loop to iterate over the nested list.</code></pre>
<p><strong>Performance optimizations:</strong> If certain performance optimizations were made to the code, you can add comments explaining why they were necessary.
bash</p>
<pre><code># We used memoization to improve the time complexity of this recursive function, as it was taking too long to execute for larger inputs.</code></pre>
<p><strong>Compatibility considerations:</strong> If the code was written with compatibility considerations in mind, you can add comments explaining why certain choices were made.</p>
<pre><code># We used the `os.path` module to ensure that this code will work on both Windows and Unix-based systems, as the path separators are different on each platform.</code></pre>
<p><strong>Limitations:</strong> If there are limitations or edge cases that the code cannot handle, you can add comments to explain this to others.</p>
<pre><code># Note that this function assumes that the input array is sorted in ascending order. If the array is unsorted, the results may be incorrect./code></pre>


## 1.5 Improves the quality
<p>Code annotation can help to improve the quality of code. By adding comments, developers can identify potential issues or edge cases, which can help to prevent these issues from occurring in the first place. This can be especially helpful when testing code, as it can help to ensure that the code is working as expected.</p>


In [None]:
## 1.6 Examples of quality aiding commentslt in unexpected behavior.</code></pre>

<p>Comments that help improve the quality of the code <strong>by explaining to others how to use it or help improve it</strong></p>
<p><strong>Error handling:</strong> Adding comments to explain how and why error handling is being implemented can help ensure that your code is robust and able to handle unexpected inputs or errors. This helps others know how the code was intended to be used.</p>
<pre><code># If the input argument is not a list or is empty, raise a ValueError
if not isinstance(input_list, list) or len(input_list) == 0:
    raise ValueError(&quot;Input must be a list that is not empty&quot;)</code></pre>
<p><strong>Complexity:</strong> If your code has particular complexities, adding comments that explain it can help others understand the performance characteristics of your code. It may help others identify whether there is a simpler way to write the code. By adding a comment that expresses uncertainty about the code and asking for suggestions, the author can potentially receive feedback from others on how to improve the code.</p>
<pre><code># This block of code could probably be simplified, but I&#39;m not sure how.
new_list = []
for i in old_list:
    if i &gt; 0:
        new_list.append(i)</code></pre>
<p><strong>Constants and variables:</strong> Adding comments to explain the purpose of constants and variables can make your code easier to use. It not only notifies others of the variables existence but lets them know if they need to change the parameters for their own purposes.</p>
<pre><code># This constant represents the maximum allowed number of retries when attempting to connect to the server.
MAX_RETRIES = 3
# This variable tracks the number of failed attempts to connect to the server.
num_retries = 0</code></pre>
<p><strong>Code organization or disorganization:</strong> Adding comments to explain the organization and structure of your code can make it easier for others to navigate and may bring out a better organizational strategy for your code.</p>
<pre><code># Define helper functions for data processing
def preprocess_data(input_data):
    # Implementation details

def analyze_data(processed_data):
    # Implementation details

def visualize_data(analyzed_data):
    # Implementation details
</code></pre>    

In [None]:
<pre><code>    
# Load data from file
input_data = load_data(&quot;input.txt&quot;)

# Preprocess data
processed_data = preprocess_data(input_data)

# Analyze data
analyzed_data = analyze_data(processed_data)

# Visualize data
visualize_data(analyzed_data)</code></pre>
<p><strong>Assumptions and constraints:</strong> Adding comments to explain the assumptions and constraints of your code can help others understand its limitations and potential pitfalls.</p>
<pre><code># This function assumes that all input values are positive integers. Negative or non-integer inputs may result in unexpected behavior.</code></pre>

## 1.7 Examples on using AI for annotation
<p>To try out some annotation with code examples, this author dug up some old code that she had no idea about what it did anymore because time had passed and she didn’t document it very well.</p>
<p>First, I wanted ChatGPT to make a README for me. README’s are a standard file that explain what the code base does and provides a summary of what is happening in the set. I asked ChatGPT if it couldv>


# Q1: You should ask your chatgpt the following question
<p>Create a README for this code</p>
<pre><code>   
import random

def generate_password(length=12, use_uppercase=True, use_numbers=True, use_symbols=True):
    lowercase = 'abcdefghijklmnopqrstuvwxyz'
    uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    numbers = '0123456789'
    symbols = '!@#$%^&*()_+-=[]{}|;:,.<>?'

    characters = lowercase
    if use_uppercase:
        characters += uppercase
    if use_numbers:
        characters += numbers
    if use_symbols:
        characters += symbols

    password = ''.join(random.choice(characters) for _ in range(length))
    return password

def main():
    print("Welcome to the Password Generator!")
    length = int(input("Enter the desired password length: "))
    use_uppercase = input("Include uppercase letters? (y/n): ").lower() == 'y'
    use_numbers = input("Include numbers? (y/n): ").lower() == 'y'
    use_symbols = input("Include symbols? (y/n): ").lower() == 'y'

    password = generate_password(length, use_uppercase, use_numbers, use_symbols)
    print(f"Your generated password is: {password}")

if __name__ == "__main__":
    main()
</code></pre>

# A: You should input the answer by chatgpt at the below cell

# Optinal Q(a more complicate sample): You should ask your chatgpt the following question
<p>Create a README for this code</p>
<pre><code>   
import numpy as np

class NeuralNetwork:
    def __init__(self, layers):
        self.layers = layers
        self.weights = []
        self.biases = []
        for i in range(len(layers) - 1):
            self.weights.append(np.random.randn(layers[i+1], layers[i]))
            self.biases.append(np.random.randn(layers[i+1], 1))

    def sigmoid(self, z):
        return 1 / (1 + np.exp(-z))

    def sigmoid_derivative(self, z):
        return self.sigmoid(z) * (1 - self.sigmoid(z))

    def feedforward(self, a):
        for w, b in zip(self.weights, self.biases):
            a = self.sigmoid(np.dot(w, a) + b)
        return a

    def backpropagation(self, x, y):
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        nabla_b = [np.zeros(b.shape) for b in self.biases]

        # Feedforward
        activation = x
        activations = [x]
        zs = []
        for w, b in zip(self.weights, self.biases):
            z = np.dot(w, activation) + b
            zs.append(z)
            activation = self.sigmoid(z)
            activations.append(activation)

        # Backward pass
        delta = (activations[-1] - y) * self.sigmoid_derivative(zs[-1])
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        nabla_b[-1] = delta

        for l in range(2, len(self.layers)):
            z = zs[-l]
            sp = self.sigmoid_derivative(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
            nabla_b[-l] = delta

        return nabla_w, nabla_b

    def train(self, training_data, epochs, mini_batch_size, learning_rate, test_data=None):
        n = len(training_data)
        for epoch in range(epochs):
            np.random.shuffle(training_data)
            mini_batches = [training_data[k:k+mini_batch_size] for k in range(0, n, mini_batch_size)]
            
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, learning_rate)
            
            if test_data:
                print(f"Epoch {epoch}: {self.evaluate(test_data)} / {len(test_data)}")
            else:
                print(f"Epoch {epoch} complete")

    def update_mini_batch(self, mini_batch, learning_rate):
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        
        for x, y in mini_batch:
            delta_nabla_w, delta_nabla_b = self.backpropagation(x, y)
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
        
        self.weights = [w-(learning_rate/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(learning_rate/len(mini_batch))*nb for b, nb in zip(self.biases, nabla_b)]

    def evaluate(self, test_data):
        test_results = [(np.argmax(self.feedforward(x)), y) for (x </code></pre>dforward(x)
        print(f"Input: {x.T}, Target: {y.T}, Output: {output.T}")    
</code></pre>

<pre><code> 
# Example usage
if __name__ == "__main__":
    # XOR problem
    training_data = [
        (np.array([[0], [0]]), np.array([[0]])),
        (np.array([[0], [1]]), np.array([[1]])),
        (np.array([[1], [0]]), np.array([[1]])),
        (np.array([[1], [1]]), np.array([[0]]))
    ]

    nn = NeuralNetwork([2, 3, 1])
    nn.train(training_data, epochs=10000, mini_batch_size=4, learning_rate=0.1)

    # Test the trained network
    for x, y in training_data:
        output = nn.feedforward(x)
        print(f"Input: {x.T}, Target: {y.T}, Output: {output.T}")    
</code></pre>

# A: You should input the answer by chatgpt at the below cell

<p>ChatGPT will give you some output that generally looked like a README:</p>
<p>This README seems accurate given the work you was doing at the time. But sometimes it doesn’t describe the code well. Plus maybe there’s not really information about the code.</p>
<p>So I asked it:</p>

# Q2: You should ask your chatgpt the following question
<p>Can you make this README more succinct and with sections about how you would use this code?</p>

# A: You should input the answer by chatgpt at the below cell

<p>This yielded a much better result.</p>

### 1.7.2 Ask for line by line annotation
<p>Second, I wanted to have ChatGPT annotate code line by line. So I asked it</p>iv>

# Q3: You should ask your chatgpt the following question and add relevant input information to make it output the answer we want
<p>Can you annotate some python code for me?</p>

# A: You should input the answer by chatgpt at the below cell

<p>It may not only provided helpful annotations but it fixed spacing and added more spaces between lines of code</p>

### 1.7.3 Ask for how the code could be improved
<p>This code is old, and I don’t think it runs properly and is in no way reproducible. So I wanted to see what ChatGPT would recommend for improvements to the code. I said</p>iv>

# Q4: You should ask your chatgpt the following question and add relevant input information to make it output the answer we want
<p>How could I make this code better?</p>

# A: You should input the answer by chatgpt at the below cell

<p>It had a lot of great advice:</p>

### 1.7.4 Sometimes it has trouble with file types if given a URL
<p>Copying and pasting code appears to be the best way to go at this point in time.</p>
<p>I asked chatGPT if it could annotate code in a Jupyter notebook at a URL I gave it:</p>t.</p>


# Q5: You should ask your chatgpt the following question 
<p>Can you annotate the code at this URL? <a href="https://github.com/jhudsl/reproducible-python-example/blob/main/make_heatmap.ipynb" class="uri">https://github.com/jhudsl/reproducible-python-example/blob/main/make_heatmap.ipynb</a></p>


# A: You should input the answer by chatgpt at the below cell

<p>It may respond with some of the code at that file where it was annotated but most of the notebook was not there.</p>
<pThen you canI elld ChatGPT:</p>

# Q6: You should ask your chatgpt the following question 
<p>That isn’t all the code at that url</p>

# A: The folloing answer is just a sample and you can compare it with yours
<p>ChatGPT told me:</p>
<pre><code>You&#39;re right, I apologize. The code at the given URL is actually in a Jupyter Notebook format, so it&#39;s not easy to copy-paste here. However, I can suggest an alternative approach: you can use the website nbviewer (https://nbviewer.jupyter.org/) to view the notebook and its code. Here are the steps to do so:

Go to https://nbviewer.jupyter.org/

Paste the URL of the notebook https://github.com/jhudsl/reproducible-python-example/blob/main/make_heatmap.ipynb into the &quot;Enter the location of a Jupyter Notebook&quot; field and press &quot;Go&quot;.

This will display the notebook and its code, and you can read through it and add your own annotations as needed.
I hope this helps!</code></pre>
<p>This was a nice tool to recommend, however it still told me I should add my own annotations at the end of this.</p>

# A: You should input the answer by chatgpt at the below cell

## 1.8 Annotation: Hands-On Exercise
### 1.8.1 The Code
<p>Here’s some R code that hasn’t been annotated particularly well.</p></pre>

<pre><code>library(qqman)
library(beepr)


# The list of datasets

datasets=c(&quot;fivehmc.glmmtmb.all.cing&quot;,&quot;fivehmc.glmmtmb.all.pari&quot;,&quot;fivemc.glmmtmb.cingulate&quot;,&quot;fivemc.glmmtmb.parietal&quot;)

mod=c(&quot;5hmc&quot;,&quot;5hmc&quot;,&quot;5mc&quot;,&quot;5mc&quot;) # For labeling purposes
tissues=rep(c(&quot;Cingulate&quot;,&quot;Parietal&quot;),2) # For labeling purposes

if(dir.exists(paste0(home,&quot;/ManhattanPlots&quot;))==FALSE){ # For storing the graphs
  dir.create(paste0(home,&quot;/ManhattanPlots&quot;))
}
tissues.f=c(&quot;Cingulate&quot;,&quot;Cingulate&quot;,&quot;Parietal&quot;,&quot;Parietal&quot;)
stage=c(&quot;limbic&quot;,&quot;neocortical&quot;,&quot;limbic&quot;,&quot;neocortical&quot;)

fdr.p=data.frame(t(c(rep(0,4))))
colnames(fdr.p)=c(&quot;FC&quot;,&quot;HolP&quot;,&quot;tissues&quot;,&quot;stage&quot;)
fdr.p=fdr.p[-1,]

for(ii in 1:length(fdr.files)){
xx=read.table(paste0(home,&quot;/&quot;,fdr.files[ii]),row.names=1,skip=1)
xx=cbind(xx,rep(tissues.f[ii],nrow(xx)),rep(stage[ii],nrow(xx)))
colnames(xx)=c(&quot;FC&quot;,&quot;HolP&quot;,&quot;tissues&quot;,&quot;stage&quot;)
fdr.p=rbind(fdr.p,xx)
}


for(ii in 1:length(datasets)){
data=eval(parse(text=datasets[ii]))

probes=rownames(data) 

# Match the probe names 
if(ii&lt;3){
yy=fdr.p[which(fdr.p[,3]==tissues[ii]),]
yy=yy[match(probes,rownames(yy)),]
yy=yy[which(!is.na(yy[,1])),]

yy.l=yy[which(yy[,4]==&quot;limbic&quot;),]
yy.n=yy[which(yy[,4]==&quot;neocortical&quot;),]

  probes.l=rownames(yy.l)
}else{
  probes.l=rownames(data) 
}

xx.l=match(probes.l,EPIC.manifest@ranges@NAMES)

chrs.l=EPIC.manifest@elementMetadata@listData$chrmA[xx.l] 
#manhattan function requires chromosomes to be noted as numeric vector with X, Y, and MT chrs being 23:25 respectively. 
chrs.l=gsub(&quot;chr&quot;,&quot;&quot;,chrs.l) 
chrs.l=gsub(&quot;X&quot;,23,chrs.l)
chrs.l=gsub(&quot;Y&quot;,24,chrs.l)
chrs.l=gsub(&quot;MT&quot;,25,chrs.l)

if(ii&lt;3){
probes.n=rownames(yy.n)
xx.n=match(probes.n,EPIC.manifest@ranges@NAMES)

chrs.n=EPIC.manifest@elementMetadata@listData$chrmA[xx.n] 
#manhattan function requires chromosomes to be noted as numeric vector with X, Y, and MT chrs being 23:25 respectively. 
chrs.n=gsub(&quot;chr&quot;,&quot;&quot;,chrs.n) 
chrs.n=gsub(&quot;X&quot;,23,chrs.n)
chrs.n=gsub(&quot;Y&quot;,24,chrs.n)
chrs.n=gsub(&quot;MT&quot;,25,chrs.n)

# Make a dataframe
manh.data.l=data.frame(probes.l,         
           as.numeric(chrs.l),
           EPIC.manifest@ranges@start[xx.l],
           -log10(yy.l[,2]),
           stringsAsFactors = FALSE)

manh.data.n=data.frame(probes.n,         
           as.numeric(chrs.n),
           EPIC.manifest@ranges@start[xx.n],
           -log10(yy.n[,2]),
           stringsAsFactors = FALSE)
colnames(manh.data.l)=colnames(manh.data.n)=c(&quot;CPG&quot;,&quot;CHR&quot;,&quot;BP&quot;,&quot;P&quot;)

}else{
  manh.data.l=data.frame(probes.l,         
                         as.numeric(chrs.l),
                         EPIC.manifest@ranges@start[xx.l],
                         -log10(data$LimbicVSNoneFDR),
                         -log10(data$NeocorticalVSNoneFDR),
                         stringsAsFactors = FALSE)
  colnames(manh.data.l)=c(&quot;CPG&quot;,&quot;CHR&quot;,&quot;BP&quot;,&quot;P&quot;,&quot;NP&quot;)
  manh.data.l=manh.data.l[-which(manh.data.l$NP==Inf),]
  
  xx=unique(which(is.nan(manh.data.l$NP)),which(is.nan(manh.data.l$P)))
  if(length(xx)&gt;0){
  manh.data.l=manh.data.l[-xx,]
  }
}

# Label them as such. 

manh.data.l=manh.data.l[-which(manh.data.l$P==Inf),]
manh.data.l$CPG=as.character(manh.data.l$CPG) # CPG&#39;s need to be a character vector
manh.data.l$CHR=as.numeric(manh.data.l$CHR) # Chromsomal locations need to be numeric
manh.data.l$BP= as.numeric(manh.data.l$BP)
manh.data.l=manh.data.l[which(!is.na(manh.data.l$CHR)),]

if(ii&lt;3){
manh.data.n=manh.data.n[-which(manh.data.n$P==Inf),]
manh.data.n$CPG=as.character(manh.data.n$CPG) # CPG&#39;s need to be a character vector
manh.data.n$CHR=as.numeric(manh.data.n$CHR) # Chromsomal locations need to be numeric
manh.data.n$BP= as.numeric(manh.data.n$BP)
manh.data.n=manh.data.n[which(!is.na(manh.data.n$CHR)),]

}

# Manhattan Plot for Limbic data
jpeg(paste0(home,&quot;/ManhattanPlots/&quot;,datasets[ii],&quot;MidManhattan2.jpeg&quot;))
if(ii&lt;3){
sig=manh.data.l$CPG[which(manh.data.l$P&gt;3)]
manhattan(manh.data.l,chr=&quot;CHR&quot;,bp=&quot;BP&quot;,p=&quot;P&quot;,snp=&quot;CPG&quot;,logp=F,ylim=c(0,round(max(manh.data.l$P))+10),highlight=sig,chrlabs=c(1:22, &quot;X&quot;, &quot;Y&quot;),suggestiveline=FALSE,genomewideline=TRUE,main=paste( mod[ii],tissues[ii],&quot;Mid Stage Disease&quot;))
}else{
sig=manh.data.l$CPG[which(manh.data.l$P&gt;3)]
manhattan(manh.data.l,chr=&quot;CHR&quot;,bp=&quot;BP&quot;,p=&quot;P&quot;,snp=&quot;CPG&quot;,ylim=c(0,round(max(manh.data.l$P[!is.na(manh.data.l$P)]))),logp=F,highlight=sig,chrlabs=c(1:22, &quot;X&quot;, &quot;Y&quot;),suggestiveline=FALSE,genomewideline=TRUE,main=paste( mod[ii],tissues[ii],&quot;Mid Stage Disease&quot;))
}
# highlight probes
abline(h=-log10(.001),col=&quot;red&quot;) 
dev.off()

# Manhattan Plot for Neocortical data
jpeg(paste0(home,&quot;/ManhattanPlots/&quot;,datasets[ii],&quot;LateManhattan2.jpeg&quot;))
if(ii&lt;3){
  sig=manh.data.n$CPG[which(manh.data.n$P&gt;3)]
  manhattan(manh.data.n,chr=&quot;CHR&quot;,bp=&quot;BP&quot;,p=&quot;P&quot;,snp=&quot;CPG&quot;,ylim=c(0,round(max(manh.data.l$P))+10),logp=F,highlight=sig,chrlabs=c(1:22, &quot;X&quot;, &quot;Y&quot;),suggestiveline=FALSE,genomewideline=TRUE,main=paste(mod[ii],tissues[ii],&quot;Late Stage Disease&quot;))
}else{
  sig=manh.data.l$CPG[which(manh.data.l$NP&gt;3)]
  manhattan(manh.data.l,chr=&quot;CHR&quot;,bp=&quot;BP&quot;,p=&quot;NP&quot;,snp=&quot;CPG&quot;,ylim=c(0,round(max(manh.data.l$NP))),logp=F,highlight=sig,chrlabs=c(1:22, &quot;X&quot;, &quot;Y&quot;),suggestiveline=FALSE,genomewideline=TRUE,main=paste(mod[ii],tissues[ii],&quot;Late Stage Disease&quot;))
}
  abline(h=-log10(.001),col=&quot;red&quot;)
  dev.off()


}
# Just a nifty way to signal that your graphs are finished being made. 
beep(sound=2)</code></pre>

### 1.8.2 Questions
<ol>
<li><p>Create a README file for this code. Make sure that it includes general purpose of the project, instructions on how to re-run the project, any software required by the project, both input and output file descriptions, and descriptions of any additional tools included in the project.</p></li>
<li><p>How can the annotation for this section be improved?</p></li>
</ol>
<pre><code># Make a dataframe
manh.data.l=data.frame(probes.l,         
           as.numeric(chrs.l),
           EPIC.manifest@ranges@start[xx.l],
           -log10(yy.l[,2]),
           stringsAsFactors = FALSE)

manh.data.n=data.frame(probes.n,         
           as.numeric(chrs.n),
           EPIC.manifest@ranges@start[xx.n],
           -log10(yy.n[,2]),
           stringsAsFactors = FALSE)
colnames(manh.data.l)=colnames(manh.data.n)=c(&quot;CPG&quot;,&quot;CHR&quot;,&quot;BP&quot;,&quot;P&quot;)</code></pre>

# 2 Understanding Unfamiliar Code
## 2.1 Learning Objectives
<ul>
<li>Recognize the benefits, limitations, and assumptions in using AI to understand unfamiliar code</li>
<li>Consider the ethical questions around using AI to understand another person’s code</li>
<li>Practice using AI to summarize code, identify an unknown coding language, and interpret regex and unknown functions</li>
</ul>

## 2.2 Reading Unfamiliar Code Is A Skill
<p>As a programmer, you will frequently encounter situations where you need to read and understand code written by other developers. This could be because you are collaborating on a project, you have taken over maintenance of an existing codebase, or you are learning new syntax or skills that require reading example codes. Comprehending code written by someone else is a distinct skill from being able to write your own code, and it takes practice to become proficient at it.</p>
<p>Reading unfamiliar code is like exploring an unfamiliar city without a map or a guide. Just like in a new city, you may not know where to start or how to navigate the codebase. You may encounter unfamiliar syntax, functions, and libraries that you have never seen before, just as you might encounter new streets, buildings, and landmarks. At first, you may feel disoriented and overwhelmed, and may need to spend some time getting oriented and familiarizing yourself with the environment. As you explore, you may start to see patterns and similarities, just as you might begin to recognize neighborhoods and landmarks in a new city. You may also encounter dead-ends, confusing intersections, and unexpected detours, just as you might encounter bugs and errors in the code.</p><p>To make progress in this unfamiliar territory, you may need to rely on a combination of intuition, deduction, and experimentation. You may need to break down the code into smaller pieces, analyze the behavior of each piece, and gradually build up a mental model of how the code works. You may also need to consult documentation, online resources, or other experts who are familiar with the codebase, just as you might ask locals or consult a map in a new city.</p>
<p>In the end, with patience, persistence, and a willingness to learn, you can successfully navigate the unfamiliar code and achieve your goals, just as you can successfully explore a new city and discover its hidden treasures. Learning how to read and understand code written by someone else is a valuable skill that can help you be a more effective programmer. By being able to read and comprehend unfamiliar code, you can contribute to projects more quickly, diagnose and fix bugs more effectively, and learn new programming techniques and best practices. As with any skill, the more you practice reading and understanding code written by others, the more comfortable and proficient you will become at it.</p



## 2.3 Example One: Summarizing Code You Didn’t Write
<p>Here’s some simple code using the R programming language, written by an unnamed programmer. If you are familiar with R, you might be able to understand what this code does with little effort. However, if you don’t know R, you’re relatively new to R, or you’re just having a bad day, understanding what this bit of code can take a great deal more energy and time. AI language bots can ease that burden.</p>
<pre><code>library(tidyverse)

d &lt;- ggplot2::diamonds

d_sel &lt;- d %&gt;%
  select(carat, cut, color, clarity, depth, table, price)

d_fil &lt;- d_sel %&gt;%
  filter(price &gt; 5000)

d_bin &lt;- d_fil %&gt;%
  mutate(carat_bins = cut_width(carat, width = 0.2))

d_summ &lt;- d_bin %&gt;%
  group_by(carat_bins, cut, color, clarity) %&gt;%
  summarize(mean_price = mean(price), sd_price = sd(price))</code></pre>
<p>We first ask <a href="https://poe.com/Claude-instant">Claude-instant</a> (as implemented by Poe) to explain the purpose of this code.</p>

# Q7: You should ask your chatgpt the following question 
<p>What does this code do?</p>

# A: You should input the answer by chatgpt at the below cell

<p>You can ask for Claude to summarize each step in detail as well.</p>

# Q8: You should ask your chatgpt the following question 
<p>Can you explain what each step does in detail?</p>

# A: You should input the answer by chatgpt at the below cell

<p>Maybe you’ve never worked with the tidyverse package in R - Claude can tell you about it.</p>

# Q9: You should ask your chatgpt the following question 
<p>What is the tidyverse package?</p>

# A: You should input the answer by chatgpt at the below cell

<p>You might also not be familiar with the diamonds dataset (especially if you’ve never used it before) and want some clarification on where the data are from.</p>

# Q10: You should ask your chatgpt the following question 
<p>Can you tell me more about the diamonds dataset?</p>>

# A: You should input the answer by chatgpt at the below cell

## 2.4 Example Two: Identifying a Coding Language
<p>Sometimes you might have to work with legacy code. Legacy code can be difficult to work with, especially if it is written in a language or style with which you are not familiar. There are more than 700 programming languages in use today, so it is impossible for any programmer to know them all. AI can be a helpful tool for identifying the language and version of legacy code, which can make your life just a little easier.</p>
<p>Let’s look at example code that might have been written decades ago.</p>
<pre><code>program temperature_smog_analysis;

uses
  Math;

const
  n_temperatures = 1000;
  n_smog_measures = 500;

type
  TemperatureArray = array[0..n_temperatures-1] of integer;
  SmogArray = array[0..n_smog_measures-1] of integer;

var
  temperatures: TemperatureArray;
  smog_measures: SmogArray;
  combined_data: array of integer;
  slope, y_intercept: double;
  i, j: integer;

function connect_to_database(filename: string): integer;
begin
  { implementation of connect_to_database function }
end;

function retrieve_temperatures(fd: integer): integer;
begin
  { implementation of retrieve_temperatures function }
end;

function retrieve_smog_measures(fd: integer): integer;
begin
  { implementation of retrieve_smog_measures function }
end;

function retrieve_temperature(fd, index: integer): integer;
begin
  { implementation of retrieve_temperature function }
end;

function retrieve_smog_measure(fd, index: integer): integer;
begin
  { implementation of retrieve_smog_measure function }
end;

procedure close_database(fd: integer);
begin
  { implementation of close_database function }
end;</code></pre>
<p>We can ask <a href="https://bard.google.com/">Bard</a> to take a guess at what the coding language might be.</p>

# Q11: You should ask your chatgpt the following question 
<p>What language is this code written in?</p>

# A: You should input the answer by chatgpt at the below cell

<p>We can explore further to figure out what indicators tell Bard the language is Pascal.</p>


# Q12: You should ask your chatgpt the following question 
<p>How do you know this is Pascal?</p>

# A: You should input the answer by chatgpt at the below cell

<p>It turns out there are multiple versions of Pascal. Since you will presumably need to work with this legacy code, you might want to know which version it is.</p>

# Q13: You should ask your chatgpt the following question 
<p>What version of Pascal is this?</p>

# A: You should input the answer by chatgpt at the below cell

## 2.5 Example Three: Interpreting Regex
<p>Regex (short for Regular Expressions) is a sequence of characters that define a search pattern. Regular expressions can be a powerful tool for data cleaning, text mining, and data validation. They are widely used in web development, data science, and other fields where text processing is important. They can also be tricky to understand at first because they involve a specific syntax that can be complex.</p>
<p>We can use <a href="https://chat.openai.com/">ChatGPT</a> to explain regex in a bit of someone else’s code.</p>
<pre><code>x = sub(&quot;/d/e&quot;,&quot;/d&quot;,x)</code></pre>

In [None]:
# Q14: You should ask your chatgpt the following question 
<p>What does this do?</p>

# A: You should input the answer by chatgpt at the below cell

## 2.6 Example Four: Demystifying Functions
<p>Sometimes we get handed code that includes complex architecture. Perhaps you are working with code you wrote years ago when you really liked loops, but present you finds them difficult to parse. (Be nice to past you - maybe you were a baby programmer and didn’t know better.) Instead of giving yourself a headache, you could turn to <a href="https://chat.openai.com/">ChatGPT</a> to explain what your old code does.</p>


<div class="sourceCode" id="cb84"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb84-1"><a href="understanding-unfamiliar-code.html#cb84-1" tabindex="-1"></a><span class="kw">def</span> my_function(x):</span>
<span id="cb84-2"><a href="understanding-unfamiliar-code.html#cb84-2" tabindex="-1"></a>    result <span class="op">=</span> x</span>
<span id="cb84-3"><a href="understanding-unfamiliar-code.html#cb84-3" tabindex="-1"></a>    <span class="cf">for</span> i <span class="kw">in</span> <span class="bu">range</span>(<span class="dv">10</span>):</span>
<span id="cb84-4"><a href="understanding-unfamiliar-code.html#cb84-4" tabindex="-1"></a>        <span class="cf">for</span> j <span class="kw">in</span> <span class="bu">range</span>(<span class="dv">5</span>):</span>
<span id="cb84-5"><a href="understanding-unfamiliar-code.html#cb84-5" tabindex="-1"></a>            result <span class="op">=</span> result <span class="op">+</span> <span class="dv">2</span> <span class="op">*</span> (i <span class="op">+</span> <span class="dv">1</span>) <span class="op">*</span> (j <span class="op">+</span> <span class="dv">1</span>) <span class="op">*</span> (i <span class="op">%</span> <span class="dv">2</span> <span class="op">==</span> <span class="dv">0</span> <span class="kw">and</span> j <span class="op">%</span> <span class="dv">2</span> <span class="op">==</span> <span class="dv">0</span>) <span class="op">-</span> <span class="dv">1</span></span>
<span id="cb84-6"><a href="understanding-unfamiliar-code.html#cb84-6" tabindex="-1"></a>    <span class="cf">return</span> result</span></code></pre></div>


# Q15: You should ask your chatgpt the following question
<p>What does this function do?</p>

# A: You should input the answer by chatgpt at the below cell

<p>Well, now you have an idea what past-you might have been attempting to do with this code. You can also query AI about the potential problems from using the code as-written.</p>

# Q16: You should ask your chatgpt the following question
<p>What are some possible issues with this code?</p>

# A: You should input the answer by chatgpt at the below cell

<p>(And if you do ever run across an expression like this in your code, you can check out our chapter on <a href="https://hutchdatascience.org/AI_for_Efficient_Programming/refactoring-code.html#refactoring-code">refactoring code</a> using AI to fix it!)</p>

## 2.7 Limitations
<p>Although chatbots can be extremely helpful for better understanding someone else’s code, if there is a lack of annotation or documentation, it will be difficult to discern more about the context of the development.</p>
<p>Cases where developers may not have provided context information include:</p>
<ol style="list-style-type: decimal">
<li>Not stating their purpose for writing this code.</li>
<li>Not describing what they planned to do next.</li>
<li>Not describing (or being aware of) possible parts of the code that need updating or maybe have security or privacy issues.</li>
<li>Not describing how they made certain decisions in the development process.</li>
</ol>
<p>Although we can’t truly understand some of this information, it is however possible to get some assistance from chatbots with prompts such as:</p>
<ol style="list-style-type: decimal">
<li>What do you think the purpose of this code is?</li>
<li>What might be possible next steps to further develop this code?</li>
<li>What are possible issues with this code, particularly for security or privacy?</li>
<li>Why was the code possibly written with this structure? What other options are possible?</li>
</ol>
<p>Chatbot tools are also limited in terms of how up-to-date their training data is to know about current possible issues with code.</p>


## 2.8 Understanding Unfamiliar Code: Hands-On Exercise
<p>Now it’s your turn to try!</p>

### 2.8.1 The Code
<p>Let’s say you were handed this bit of code to work with (and eventually modify), but first you need to figure out what’s going on with it. Unfortunately, the original programmer left very little in the way of notes or annotation.</p>
<p><strong>Note</strong>: This code is just an example and was written strictly for educational purposes.</p>
<pre><code>use ode_solvers::{Euler, OdeMethod};
use plotters::prelude::*;
use std::fs::File;


const N: f64 = 1000000.0;  
const I0: f64 = 10.0;  
const R0: f64 = 0.0;   
const S0: f64 = N - I0 - R0;  
const BETA: f64 = 0.3;  
const GAMMA: f64 = 0.1;  

fn sir_model(t: f64, y: &amp;[f64], v: f64) -&gt; Vec&lt;f64&gt; {
    let s = y[0];
    let i = y[1];
    let r = y[2];
    let dsdt = -BETA * s * i / N - v * s;
    let didt = BETA * s * i / N - GAMMA * i;
    let drdt = GAMMA * i + v * s;
    vec![dsdt, didt, drdt]
}

fn main() {

    let mut solver = Euler::new(sir_model);
    solver.set_initial_condition(&amp;[S0, I0, R0]);


    let root = BitMapBackend::new(&quot;sir_vaccination.png&quot;, (800, 600)).into_drawing_area();
    root.fill(&amp;WHITE).unwrap();
    let mut chart = ChartBuilder::on(&amp;root)
        .caption(&quot;SIR model with vaccination&quot;, (&quot;sans-serif&quot;, 40))
        .set_label_area_size(LabelAreaPosition::Left, 60)
        .set_label_area_size(LabelAreaPosition::Bottom, 40)
        .build_cartesian_2d(0.0..100.0, 0.0..N)
        .unwrap();

    let mut data1 = vec![];
    for _ in 0..1000 {
        let y = solver.integrate(0.1, 0.0);
        data1.push((solver.time(), y[0], y[1], y[2]));
    }


    let vac_rate = 0.1;   
    let vac_coverage = 0.5;   
    let vac_num = vac_coverage * N;   
    let mut data2 = vec![];
    for i in 0..1000 {
        let t = i as f64 * 0.1;
        let v = if t &gt;= 50.0 &amp;&amp; t &lt; 150.0 {vac_num * vac_rate} else {0.0};
        let y = solver.integrate(0.1, v);
        data2.push((solver.time(), y[0], y[1], y[2]));
    }


    chart
        .configure_mesh()
        .x_labels(10)
        .y_labels(10)
        .disable_x_mesh()
        .disable_y_mesh()
        .draw()
        .unwrap();
    chart
        .draw_series(LineSeries::new(
            data1.iter().map(|d|</code></pre>

### 2.8.2 Questions
<ol style="list-style-type: decimal">
<li><p>Write a query that identifies the features that can be used to determine the coding language, as well as the version of the language. What language and version is it written in? What features give away the coding language?</p></li>
<li><p>Create a prompt that determines what this code does, as well as important features of the program. What does this code do?</p></li>
<li><p>Write a prompt that determines the purpose of this specific code snippet:</p></li>
</ol>
<pre><code> let root = BitMapBackend::new(&quot;sir_vaccination.png&quot;, (800, 600)).into_drawing_area();</code></pre>
<p>What is the purpose of this line of code?</p>