# Lab Assignment 1: How to Get Yourself Unstuck
## DS 6001: Practice and Application of Data Science

### Nicholas Keeley, nkg3pf, 26 May 2021

### Instructions
Please answer the following questions as completely as possible using text, code, and the results of code as needed. Format your answers in a Jupyter notebook. To receive full credit, make sure you address every part of the problem, and make sure your document is formatted in a clean and professional way.

### Problem 0
Import the following libraries:

In [73]:
import numpy as np
import pandas as pd
import os
import math

### Problem 1 
Python is open-source, and that’s beautiful: it means that Python is maintained by a world-wide community of volunteers, that Python develops at the same rate as advancements in science, and that Python is completely free of charge. But one downside of being open-source is that different people design many alternative ways to perform the same task in Python.

Read the following Stack Overflow post: https://stackoverflow.com/questions/11346283/renaming-columns-in-pandas/46912050. The question is simply how to rename the columns of a dataframe using Pandas. Count how many unique different solutions were proposed, and write this number in your lab report. (Hint: the number of solutions is not the number of answers to the posted question.)

Remember: your goal as a data scientist needs to be to process/clean/wrangle/manage data as quickly as possible while still doing it correctly. A big part of that job is knowing how to seek help to find the right answer quickly. Given the number of proposed solutions on this Stack Overflow page, what’s the problem with developing a habit of using Google and Stack Overflow as your first source for seeking help? (2 points)

### *Answer P1:*

*I counted 11 distinct/semi-distinct solutions to this problem. That said, I didn't test the solutions myself, and several of them were complex - requiring a significant amount of effort to code. This highlights the problem with using Google and Stack Overflow as a first source for seeking help: there are countless ways to solve any problem in Python, and the solutions posted cater to a variety of different programming ability levels and backgrounds. Furthermore, there is no way to tell if an "answer" is a "solution" without testing it yourself - an endeavor which would take a long time to accomplish.*

### Problem 2
There are several functions implemented in Python to calculate a logarithm. Both the `numpy` and `math` libraries have a `log()` function. Your task in this problem is to calculate log$_3(7)$ directly (without using the change-of-base formula). Note that this particular log has a base of 3, which is unusual. For this problem:

* Write code to display the docstrings for each function.

* Read the docstrings and explain, in words in your lab report, whether it is possible to use each function to calculate log$_3(7)$ or not. Why did you come to this conclusion?

If possible, use one or both functions to calculate log$_3(7)$ and display the output. (2 points)

In [74]:
# Math log function docstring.
print(math.log.__doc__)

log(x, [base=math.e])
Return the logarithm of x to the given base.

If the base not specified, returns the natural logarithm (base e) of x.


In [75]:
# Numpy log function docstring.
print(np.log.__doc__)

log(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])

Natural logarithm, element-wise.

The natural logarithm `log` is the inverse of the exponential function,
so that `log(exp(x)) = x`. The natural logarithm is logarithm in base
`e`.

Parameters
----------
x : array_like
    Input value.
out : ndarray, None, or tuple of ndarray and None, optional
    A location into which the result is stored. If provided, it must have
    a shape that the inputs broadcast to. If not provided or None,
    a freshly-allocated array is returned. A tuple (possible only as a
    keyword argument) must have length equal to the number of outputs.
where : array_like, optional
    This condition is broadcast over the input. At locations where the
    condition is True, the `out` array will be set to the ufunc result.
    Elsewhere, the `out` array will retain its original value.
    Note that if an uninitialized `out` array is created via the default


In [76]:
# Math version.
result = math.log(7, 3)
print("The answer is " + str(result))

The answer is 1.7712437491614221


### *Answer P2:*

*Based on the docstrings produced, the math.log function can calculate log$_3(7)$ directly, while the numpy.log function would require a base change formula. The documentation present for math.log specifies "base" as a parameter, while the numpy.log documentation does not have a "base" parameter, focusing instead on producing an array of logarithms of the static, default base (e).*




### Problem 3
Open a console window and place it next to your notebook in Jupyter labs. Load the kernel from the notebook into the console, then call up the docstring for the `pd.DataFrame` function. Take a screenshot and include it in your lab report. (To include a locally saved image named `screenshot.jpg`, for example, create a Markdown cell and paste 
```
<img src="screenshot.jpg" width=600>
```
(2 points)

### *Answer P3:*


<img src="screenshot.jpg" width=600>


### Problem 4
Search through the questions on Stack Overflow tagged as Python questions: https://stackoverflow.com/questions/tagged/python. Find a question in which an answerer exhibits passive toxic behavior as defined in this module's notebook. Provide a link, and describe what specific behavior leads you to identify this answer as toxic. (2 points)

### *Answer P4:*

*In the comments section of the following post (https://stackoverflow.com/questions/26724585/what-is-perls-equivalent-of-pythons-time-time), you'll notice toxic behavior that is rationalized as "rare" by the commenter. The commenter stated that he "rarely berates people for not Googling," and then proceeds to shame the poster for not placing the query into Google or searching documentation. Unsurprisingly, the poster responds by saying he/she has already attempted to Google the question but could not find a suitable answer.*



### Problem 5
Search through the questions on Stack Overflow tagged as Python questions: https://stackoverflow.com/questions/tagged/python. Find a question in which a questioner self-sabotages by asking the question in a way that the community does not appreciate. Provide a link, and describe what the questioner did specifically to annoy the community of answerers. (2 points)

### *Answer P5:*

*In the following link (https://stackoverflow.com/questions/67708356/hi-can-somebody-tell-me-whats-wrong-with-my-code-im-16-and-im-learning-pyht), the questioner is clearly young and hasn't asked questions on SO before. He/she does several things wrong as a result: 1) Doesn't provide context or desired outcome for the code, 2) Doesn't provide error messages, and 3) Doesn't provide description of debugging attempts.  As a result, the questioner has self-sabotaged, and the post has been downvoted multiple times.*

### Problem 6
These days there are so many Marvel superheros, but only six superheros count as original Avengers: Hulk, Captain America, Iron Man, Black Widow, Hawkeye, and Thor. I wrote a function, `is_avenger()`, that takes a string as an input. The function looks to see if this string is the name of one of the original six Avengers. If so, it prints that the string is an original Avenger, and if not, it prints that the string is not an original Avenger. Here’s the code for the function:

In [77]:
def is_avenger(name):
    if name=="Hulk" or "Captain America" or "Iron Man" or "Black Widow" or "Hawkeye" or "Thor":
        print(name  + "'s an original Avenger!")
    else:
        print(name + " is NOT an original Avenger.")

To test whether this function is working, I pass the names of some original Avengers to the function:

In [78]:
is_avenger("Black Widow")

Black Widow's an original Avenger!


In [79]:
is_avenger("Iron Man")

Iron Man's an original Avenger!


In [80]:
is_avenger("Hulk")

Hulk's an original Avenger!


Looks good! But next, I pass some other strings to the function

In [81]:
is_avenger("Spiderman")

Spiderman's an original Avenger!


In [82]:
is_avenger("Beyonce")

Beyonce's an original Avenger!


Beyonce is a hero, but she was too busy going on tour to be in the Avengers movie. Also, Spiderman definitely was NOT an original Avenger. It turns out that this function will display that any string we write here is an original Avenger, which is incorrect. To fix this function, let’s turn to Stack Overflow.

#### Part a
The first step to solving a problem using Stack Overflow is to do a comprehensive search of available resources to try to solve the problem. There is a post on Stack Overflow that very specifically solves our problem. Do a Google search and find this post. In your lab report, write the link to this Stack Overflow page, and the search terms you entered into Google to find this page.

Then apply the solution on this Stack Overflow page to fix the `is_avenger()` function, and test the function to confirm that it works as we expect. (2 points)

#### Part b
Suppose that no Stack Overflow posts yet existed to help us solve this problem. It would be time to consider writing a post ourselves. In your lab report, write a good title for this post. Do NOT copy the title to the posts you found for part a. (Hint: for details on how to write a good title see the slides or https://stackoverflow.com/help/how-to-ask) (3 points)

#### Part c
One characteristic of a Stack Overflow post that is likely to get good responses is a minimal working example. A minimal working example is code with the following properties:

1. It can be executed on anyone’s local machine without needing a data file or a hard-to-get package or module

2. It always produces the problematic output

3. It using as few lines of code as possible, and is written in the simplest way to write that code

Write a minimal working example for this problem. (3 points)

### *Answer 1a:*

*The link to the solution for this problem is as follows: https://stackoverflow.com/questions/12335382/multiple-conditions-with-if-elif-statements. The search terms I used to find this page were "python if else statement multiple or conditions always returns true stack overflow", in order to get results from Stack Overflow specifically.* Implementing the solution (shown in code below) worked well for the Beyonce test case.*

In [83]:
# Second attempt with SO solution.

def is_avenger(name):
    if name== "Hulk" or name == "Captain America" or name == "Iron Man" or name == "Black Widow" or name == "Hawkeye" or name =="Thor":
        print(name  + "'s an original Avenger!")
    else:
        print(name + " is NOT an original Avenger.")

is_avenger("Beyonce")

Beyonce is NOT an original Avenger.


### *Answer 1b:*

*I would choose the following title: "If/Else statement with multiple "or" conditions not working: always returns True." This title distills the problem, includes a description of the syntax being debugged, and includes why this problem is unique ("multiple 'OR' conditions").*

### *Answer 1c:*

*See code for minimum working example below:*

In [84]:

# Problematic code.
def is_avenger(name):
    if name=="Hulk" or "Captain America" or "Iron Man" or "Black Widow" or "Hawkeye" or "Thor":
        print(name  + "'s an original Avenger!")
    else:
        print(name + " is NOT an original Avenger.")

# Use case that should return "is an original Avenger!"
correct = "Hulk"

# Use case that should NOT return "is an original Avenger!" but does.
incorrect = "Beyonce"

### Test ###
is_avenger(incorrect)

Beyonce's an original Avenger!


### Problem 7
Sign on to the PySlackers slack page and send me a private message in which you tell me which three channels on that Slack workspace look most interesting to you. (2 points)

### *Answer P7:*

*Please see message in slack, where I discuss NLP and related channels.*