# Packages, Objects and Functions

> Python has long maintained the philosophy of **"batteries included"** -- having a rich and versatile standard library which is immediately available, without making the user download separate packages. This gives the Python language a head start in many projects.
>
> \- PEP 206

## Applied Review

### Fundamentals

- Python's common *atomic*, or basic, data types are:
    - Integers
    - Floats (decimals)
    - Strings
    - Booleans

- These simple types can be combined to form more complex types, including:
    - Lists: Ordered collections
    - Dictionaries: Key-value pairs
    - DataFrames: Tabular datasets

## Packages (aka *Modules*)

So far we've seen several data types that Python offers out-of-the-box.
However, to keep things organized, some Python functionality is stored in standalone *packages*, or libraries of code.
The word "module" is generally synonymous with package; you will hear both in discussions of Python.

For example, functionality related to the operating system -- such as creating files and folders -- is stored in a package called `os`.
To use the tools in `os`, we *import* the package.

In [None]:
import os

Once we import it, we gain access to everything inside.
With Jupyter's autocomplete, we can view what's available.

In [None]:
# Move your cursor to the end of the below line and press tab.
os.

Some packages, like `os`, are bundled with every Python install; downloading Python guarantees you'll have these packages.
Collectively, this group of packages is known as the *standard library*.

Other packages must be downloaded separately, either because
- they aren't sufficiently popular to merit inclusion in the standard library
- *or* they change too quickly for the maintainers of Python to keep up

The DataFrame type that we saw earlier is part of the `pandas` package (short for *Panel Data*).
Since pandas is specific to data science and is still rapidly evolving, it is not part of the standard library.

We can download packages like pandas from the internet using a website called [PyPI](https://pypi.org/), the *Python Package Index*.

As a convenience to us the Azure ML Workbench evironment we are using comes with pandas already pre-installed.

It's possible to import packages under an *alias*, or a nickname.
The community has adopted certain conventions for aliases for common packages;
while following them isn't mandatory, it's highly recommended, as it makes your code easier for others to understand.

pandas is conventionally imported under the alias `pd`.

In [None]:
import pandas as pd

In [None]:
# Importing pandas has given us access to the DataFrame, accessible as pd.DataFrame
pd.DataFrame

Third-party packages unlock a huge range of functionality that isn't available in native Python; much of Python's data science capabilities come from a handful of packages outside the standard library:

- pandas
- numpy (numerical computing)
- scikit-learn (predictive modeling)
- matplotlib (graphing)
- altair (interactive data visualization)
- tensorflow (deep learning)

## Your Turn

<img src="images/exercise.png" style="width: 1000px;"/>

<font class="your_turn">
    Your Turn
</font>

1. Importiere die `datetime` Bibliothek. Stelle sicher, dass die Bibliothek via das Alias `dt` aufgerufen werden kann.
2. Nutze die "Tab-Complete" Funktionalität von Jupyter um alle Inhalte der datetime Bibliothek anzuzeigen.
3. Führe `dt.datetime.now()` aus, um ein datetime Objekt zu erzeugen, dass den aktuellen Zeitpunkt repräsentiert. Speichere dieses Objekt in einer Variablen mit dem Namen `jetzt`.
4. Welchen Typ hat das Objekt in der Variable `jetzt`?
5. Nutze die "Tab-Complete" Funktionalität um alle Optionen des `jetzt` Objekt anzuzeigen. Versuche via Code das Jahr, den Monat und den Heutigen Tag herauszufinden und speichere diese als die Variablen `jahr`, `monat` und `tag` ab.

*Tipp: Denk daran zuerst den Namen der importierten Bibliothek zu schreiben, gefolgt von einem `.` und dann den Namen der Funktion in der Bibliothek. Also in diesem Fall `dt.`. Dann, mit dem Cursor direkt hinter dem `.` drücke die "Tab" Taste um die Auto-Vervollständigung zu aktivieren und alle verfügbaren Optionen anzuzeigen.*

*Das selbe Vorgehen kann auch auf das `jetzt` Objekt angewandt werden.*

#<font color='white'>
# Lösung
import datetime as dt
jetzt = dt.datetime.now()
jahr = jetzt.year
monat = jetzt.month
tag = jetzt.day
print(jahr, monat, tag)
#</font>

#<font color='white'>
# Stunde, Minute, Sekunde
zeitstempel = jetzt.time()
stunde = zeitstempel.hour
minute = zeitstempel.minute
sekunde = zeitstempel.second
print(stunde, minute, sekunde)
#</font>

## Dot Notation with Packages

We've seen it a few times already and used it in the exercise just now, but now it's time to discuss it explicitly:
things inside packages can be accessed with *dot-notation*.

Dot notation looks like this:
```python
pd.Series
```

or
```python
import datetime as dt
dt.datetime.now()
```

or
```python
import numpy as np
np.array
```

You can read this as "the `array` variable, within the Numpy library".

**Packages can contain pretty much anything** that's legal in Python;
if it's code, it can be in a package.

This flexibility is part of the reason that Python's package ecosystem is so expansive and powerful.

## Functions

As you may have noticed already, occasionally we run code using parentheses `()`.
The feature that permits this in Python is **functions** -- code snippets wrapped up into a single name.

For example, take the `type` function we saw above.
```python
type(x)
```

`type` does some complex things under the hood -- it looks at the variable inside the parentheses, determines what type of thing it is, and then returns that type to the user.

In [None]:
x = 7
type(x)

But the beauty of `type`, and of all functions, is that you (as the user) don't need to know all the complex code that's necessary to figure out that x is an `int` -- you just need to remember that there's a `type` function to do that for you.

Functions make you much more powerful, as they unlock lots of functionality within a simple interface.

```python
# Get the first few rows of the movies data.
movies.head()
```

```python
# Read in the movies.csv file.
pd.read_csv('../data/movies.csv')
```

The variables within the parentheses are called function arguments, or simply **arguments**.

Above, the string `'../data/movies.csv'` is the argument to the `pd.read_csv` function.

Functions are integral to using Python, because it's much more efficient to use pre-written code than to always write your own.

If you want to write your own functions -- perhaps to share with others, or to make it easier to reuse your work -- it's fairly simple to do so.  

Let's take a look at how to do so.

You can create your own functions like this:
```python
def my_function():
    result = "Python is awesome!"
    return result
```

And then call them like so:
```python
my_function()
```

Let's do this interactively

In [None]:
# Define the function

# Then call it


Functions can also take inputs: 
```python
def my_function_with_inputs(input_name, input_hour_of_day):
    # Perform operations on input arguments
    uppercase_name = input_name.upper()  # Make name uppercase
    
    if input_hour_of_day < 12:
        result = f"Good morning, {uppercase_name}."
    else:
        result = f"Good afternoon, {uppercase_name}."
  
    return result
```

Again, let's do this interactively

In [None]:
# Define the function
def my_function_with_inputs(input_name, input_hour_of_day):
    # Perform operations on input arguments
    uppercase_name = input_name.upper()  # Make name uppercase
    
    if input_hour_of_day < 12:
        result = f"Good morning, {uppercase_name}."
    else:
        result = f"Good afternoon, {uppercase_name}."
  
    return result
# Call it

# Call with missing argument


## Your Turn

<img src="images/exercise.png" style="width: 1000px;"/>

<font class="your_turn">
    Your Turn
</font>

1. Schreibt eine Funktion mit dem Namen `add_two`, die zwei Argumente als Inputs entgegen nimmt und die Summe der beiden Argumente zurück gibt.
2. Führe die Funktion aus und teste die Funktionweise mit einigen verschiedenen Input Argumenten. Z.b. zwei Integers (Ganze Zahlen), eine Integer & eine Float (Dezimalzahl)..
3. Weise das Ergebnis einer Variablen mit dem Namen `calculation_result` zu. Dann verifizieren den Typ & Inhalt der `calculation_result` Variable in einer neuen Code Zelle.
4. Bonus: Was passiert, wenn wir unserer Funktion zwei Strings als Input geben? Und was passiert wenn wir eine Integer und einen String als Input geben?

### Lösung:

#<span style="color: white">
def add_two(x, y):
    return x + y
#</span>

In [None]:
# add_two(2,4)

#<span style="color: white">
calculation_result = add_two(2, 4)
print("The result is:", calculation_result)
print("The type is:", type(calculation_result))
#</span>

In [None]:
# Introduce F-Strings

# Questions

Are there any questions up to this point?

<img src="images/any_questions.png" style="width: 1000px;"/>


## Objects and Dot Notation

Dot-notation, which we discussed in relation to packages, has another use -- accessing things inside of *objects*.

What's an object? Basically, a variable that contains other data or functionality inside of it that is exposed to users.

For example, DataFrames are objects.

In [None]:
import pandas as pd
df = pd.DataFrame({'first_name': ['Jannick', 'Arno', 'Andreas', 'Moritz', 'Arno'], 'last_name': ['Töppel', 'Angerer', 'Egger', 'Wöhl', 'Fuchs']})

In [None]:
df

In [None]:
df.shape

In [None]:
df.describe()

You can see that DataFrames have a `shape` variable and a `describe` function inside of them, both accessible through dot notation.

Variables inside an object are often called *attributes* and functions inside objects are called *methods*.

### Objects, Functions and Methods in the Context of DataFrames

As we saw above, DataFrames are a type of Python object, so let's use them to explore the new Python features we've learned.

Using the `read_csv` function from the Pandas package to read in a DataFrame

In [None]:
df = pd.read_csv('../data/companies.csv')

Using the `type` function to determine the type of `df`

In [None]:
type(df)

Using the `head` method of the DataFrame to view some of its rows

In [None]:
df.head()

Examining the `columns` attribute of the DataFrame to see the names of its columns.

In [None]:
df.columns

Inspect the `shape` attribute to find the *dimensions* (number of rows and columns) of the DataFrame.

In [None]:
df.shape

Call the `describe` method to get a summary of the data in the DataFrame.

In [None]:
df.describe()

Depending on the type of attributes (nominal / numeric) the `describe` method gives a different type of summary.

In [None]:
df.describe()

In [None]:
share_prices = pd.read_csv('../data/prices.csv')
share_prices.head(2)

In [None]:
share_prices.describe()

Now let's combine methods and DataFrame attributes: We use the `type` function to determine what `df.describe` holds.

In [None]:
type(df.describe)

<font class="question">
    <strong>Question</strong>:<br><em>Does this result make sense? What would happen if you added parens? i.e. </em><code>type(df.describe())</code>
</font>

## Your Turn

<img src="images/exercise.png" style="width: 1000px;"/>

<font class="your_turn">
    Your Turn
</font>

Das war viel Input. Nehmt euch etwas Zeit und erforscht via "Tab-Completion" welche Methoden & Attribute für die Interaktion mit DataFrames bereit stehen. Wenn ihr mehr über eine bestimmte Methode oder Funktion herausfinden möchtet, könnt ihr die `function_name?` Notation nutzen, zum Beispiel `df.describe?`.  

Sucht euch 2-3 Methoden heraus, die besonders interessant klingen und versucht mehr über deren Nutzung und Funktionsweise herauszufinden. Ihr könnt hierfür auch die interaktive Dokumentation nutzen, die sich via `Shift+Tab` aufrufen lässt, wenn der Cursor sich auf einer Funktion befindet.

<font class="your_turn">
    Your Turn
</font>

Spend some time using autocomplete to explore the methods and attributes of the `df` object we used above.
Remember from the Jupyter lesson that you can use a question mark to see the documentation for a function or method (e.g. `df.describe?`).