# How to utilize the internet for Python

Being *able* to code doesn't mean you are fluent in a certain language and can do everything perfectly. As a medical physicist, it means that given the resources or references available, you can accomplish your goal, no matter how frankenstiened together it may be.

An important take home message from this python introduction is that...
### For practically everything and anything you could try and do in python (or any programming language), someone else on the internet has asked and answered the same question, a similar or applicable question, or a question that answers a *part* of your question.

The main challenge becomes being able to find those questions and answers. Then, through experience, practice, and exposure to other people's codes and methods, you will become better programmers.

---

## How to Google a problem

We can not stress enough how important it is to understand that everything task in coding has been accomplished in some form on the internet. No matter what you're trying to accomplish, there is an answer somewhere.

For example, say you just performed an experiment and have gathered two datasets:

```python
data_group_A = [5, 8, 3, 9, 10]
data_group_B = [12, 5, 9, 15, 13]
```


From experience in experimental design, you know you want to perform a paried T-Test to compare if the two groups are statistically different or not. To figure out how to do this in python, we can google something along the lines of:

<center>"Python paired T-Test between two groups of data"<center>

Below are some examples of the google results I see:

* Forum Posts
    * Stack Overflow [One Sample & Two Sample t-tests in Python](https://stackoverflow.com/questions/60665307/one-sample-two-sample-t-tests-in-python)
    * Kaggle [Paired Samples t-test](https://www.kaggle.com/code/zahrazolghadr/paired-samples-t-test)
    * GitHub [stats-with-python](https://github.com/trangel/stats-with-python/blob/master/notebooks/paired%20t-test.ipynb)
* Tutorial Centers 
    * GeeksforGeeks [How to conduct a Paired Samples T-Test in Python](https://www.geeksforgeeks.org/how-to-conduct-a-paired-samples-t-test-in-python/)
    * DataCamp [Hypothesis Testing in Python](https://campus.datacamp.com/courses/hypothesis-testing-in-python/two-sample-and-anova-tests-2?ex=1)
    * I/O Flood [Python T-Test Guide: Functions, Libraries, Examples](https://ioflood.com/blog/python-t-test/)
* Code Documentation 
    * SciPy package [ttest_ind - SciPy v1.14.1 Manual](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html)
    * GitHub [t-test](https://github.com/xaviervasques/t-test)
* Related research areas 
    * NeuralDataScience [Basic Statistics in Python: t tests with SciPy](https://neuraldatascience.io/5-eda/ttests.html)
    * ScientificallySound [Calculating sample size for a paired t-test](https://scientificallysound.org/2017/08/03/calculating-sample-size-for-a-paired-t-test/)

Depending on how straight forward or extremely specific the question is, the types of responses you get can be widely different. Here, the questions was general, googling just the general topic we are interested in and get back a lot of "tutorials" demonstrating directly how to perform the t-test.

Say I had already performed a T-Test in Python, knew the general process involved and that the package `SciPy` contains the statistics functions I am looking for. I just don't remember the specific function name or what arguments I need for my experiment. In this case, I would google something along the lines of "Python SciPy T-Test" and I'd get a narrower scopy of searches, potentially with more emphasise on code documentation.

Now imagine you are deep in a rabbit hole trying to apply deep learning to your research. You are past the "general" questions and have a specific need. When you google somehting pertaining to that scenario, you will mostly see documentation and forum posts. The more specific and "advanced" the question, the more links to `Stack Overflow` or `GitHub` discussion boards you will find.

Lets break down these searches a little further.


---
### The Forum Posts

Forums, such as `Stack Overflow`, are where people post the coding problems they are facing and other experts offer their insights, either through an explination or a direct code snip-bit example that you can copy and implement into your code. These posts are generally a really great place to find answers to questions of any level. That said, these do rely on people responding. Unfortunately, there will be instances where you run into the classic post where someone asks the exact same question as you *years ago* yet no one has yet to respond. Or even more infuriating, the posts where someone asks the question and no one answers, *except* for the original poster who responds with *"Oh, nevermind, I figured it out"* with no elaboration...

Otherwise, forum posts are a great tool to see how others may accomplish a task at hand. Looking at what someone else wrote, trying to digest what they wrote, interpret what the code is doing, and examining the methods used in the snip-bit will allow you to grow as a programmer, expanding your horizons to what is possible.

---
### The Tutorial Centers

Many websites, like `GeeksforGeeks` are dedicated to specifically providing tutorials on how to accomplish something in a programming language. These webpages are typically for a general case or often include an example of its use. They may not *directly* answer you question, but they can provide a fantastic template to do so, or give you the insight on how something works to then apply it to your situation. These tutorials won't just give the answer, but will try to also explain what is happening, what a variable is, or what a functionality is doing to help you understand what is going on.

These are great learning tools when trying to accomplish something in the general sense or expand your understanding of a topic.

---
### Code Documentation

Python packages are (mostly) open source. meaning, you can find their definitions, functions, and methods on the internet. Most commonly, they are on `GitHub` or a webiste that is associated / powered through `GitHub`. When publishing a package to the internet, it is common practice to provide detailed documentation of what a function is or does, its input arguments, expected output, and potentially even an example of how to use it. The large scale packages (i.e. `numpy`, `SciPy`, and `Python` itself) all do this to a thorough degree. Smaller groups (like independent research groups making their code publically available) will still provide documentation and use examples. However, these are not a requirement and up to the will of the poster. 

In our scenario, if we knew we wanted a SciPy T-Test we could looks specifically for the SciPy documentation for what they have to offer and how to use it. 

Documentation is a great way to have a complete view of how something works, however it can sometimes be overwhelming and complex to view. It may be best to use this when you are already somewhat comfortable with coding or after reading a tutorial to gain a more complete idea of how it all comes together.

---
### Related areas of research

Some topics you google are very closely related to other research groups. In this case, the T-Test is an important aspect in all areas of science. There are groups and societies out there that do a lot of programming-based analysis and provide guides (much like the tutorial centers or forum posts) on how to do things in the context of their work. For instance, the Neuro Imaging groups often provide tutorials for analysis and image-based processes and may be a great tool for doing scientific programming tasks.


---

## How google an Error

Many times you follow a tutorial or get some code from a forum, chances are it will not work perfectly correct on the first pass. In these cases, you will get an error output. This error will most likely either be as a result of a python based issue or something to do with the package you're using.

Say you try to add a python number and a string together:

In [8]:
a = "2"
b = 4

c = a + b
print(c)

TypeError: can only concatenate str (not "int") to str

The error message contains very helpful information, including indication as to where the error originated from, the *type* of error that was found, and a message related to the issue.

These messages are custom defined. Here, the issue is a pythonic thing, and someone when developing python had to define this issue to be a `TypeError` with the error message `"can only concatenate..."`. When using python packages, many different errors and error messages will be defined and reported. To try and figure out the issue at hand, we can copy and paste the entire error message into google like so:

<center>"Python TypeError: can only concatenate str (not "int") to str"

For me, one of the first results are from StackOverflow: ["How to resolve TypeError: can only concatinate str (not 'int') to str"](https://stackoverflow.com/questions/51252580/how-to-resolve-typeerror-can-only-concatenate-str-not-int-to-str)

Someone else had the same issue when trying to add a string and an integer together. The top response to the thread explains in a clear manner what is going wrong and offers a solution to fix it.

In [9]:
# Recommented fix:
a = "2"
b = 4

c = a + str(b)
print(c)

24


And it works!

When trying to use a built python package, when an error occurs, simply googling something along the lines of the following will do the same trick...

<center>"Python Package_Name Error_Message"

or 

<center>"Python Package_Name Function_Name Error_Message"

---

## When the output is not as expected

Sometimes when using a python package, the output is not what you expect. Say you are electronically collecting data. However, there is an issue and one of the numbers is replaced with "NaN" (not a number). You are using the `numpy` package to aid the handling of your data and you go to take the mean of the sample:

In [18]:
import numpy as np

array = np.array([0, 1, 2, np.nan, 3, 4, 5, 6])
mean = np.mean(array)
print(f'{mean=}')

mean=nan


You expected a number but got a nan instead! 

We can google the output and see if the internet can help us figure out what is wrong like so:

<center>"Python numpy.mean returns nan"

One of the first results is again a StackOverflow: [Average of a numpy array returns NaN](https://stackoverflow.com/questions/36224066/average-of-a-numpy-array-returns-nan).

Here, someone is trying to handle `nan` values in their dataset when taking the mean. The top comment explains what is wrong in their implementation and offers the solution on how to handle this scenario:

In [20]:
mean = np.mean(array[~np.isnan(array)])
print(f'{mean=}')

mean=3.0


And it works! It ignored the nan value and made the array useable when taking the mean.

---
## Generative AI

You will have an entire discussion of generative AI in the next portion of the orientation. However, here is a quick note:

Generative AI is a very great tool in trying to handle issues in python, trying to figure out how to accomplish something, or if you're trying to understanding something better. When googling the above errors, Google's AI answered everything, provided examples *and* explained what was going on.

Brilliant.

That said, *generative AI is not the truth*. It is simply a guess at what the most probabilistic output should be to a question. For simple questions like what we provided above, this works great. However, it does not know *everything*. It can easily throw out garbage code that it expects to work, but in reality is just nonsense. Or it could try to explain what something is doing, *with full confidence* and be blatantly wrong. 

Be careful when trusting the output. Having an understanding of how coding works, the do's, don't's, and limitations of what you're trying to do, and have an intuitive idea of what a line of code may do will help you navigate the output of a generative AI and better your ability as a coder.

When asking about how to do the T-Test in python, I got the following result (below) from Google's AI...

It provided:
* The function I should use.
* An example code (That runs!) that shows how it can be used.
* An explanation as to what is going on.
* Guidence on how to interpret the output.

Simply Brilliant.

![image.png](attachment:image.png)