# Pandas, Lambdas & Axis

## Functions

The real power in any programming language is the **Function**.

A function is:

* a little block of script (one line or many) that performs specific task or a series of tasks.
* reusable and helps us make our code DRY.
* triggered when something "invokes" or "calls" it.

There are two types of functions in Python: ```defined functions``` and ```lambda functions```.

### First, a defined function

```def functionName(argument):
        expression using provided argument```

In [None]:
## a really simple function:


In [None]:
## invoke the function


In [None]:
## Add parameter


In [None]:
## invoke function


In [None]:
## create a defined function that takes anything and multiplies it by two


In [None]:
## try it on a number


In [None]:
## try it on a string


### Lambda (or anonymous) Functions

* Lambda functions allow you to create a custom calculation.
* You can then apply this calculation to targeted columns in your Pandas dataframe.
* Lambda functions, also known as ```anonymous functions```, are considered quick, throwaway functions.



## Structure of a Lamba Function 


 ```lambda arguments: expression```
 
 A real lambda function:
 
 ```lambda x: x * 2```
 
 Compared to:
 ```def times2(x):
    return x*2```

In [None]:
## import library
import pandas as pd
import random ## generate some random values

## Applying Lambdas to Pandas dataframes

* In Pandas, we ```apply()``` lambda expressions to columns.

Here we create a new column by multiplying existing values in each row of a column by 2.

```df['new_column'] = df['existing_column'].apply(lambda x: x * 2)```

In [None]:
## create a df with mock data

data = {'firm': ['Bilk Inc.', 'Vine & Co.', 'Kiln Inc.', 'Y & Y Consulting', 'Trending Inc.', "State Insurance"],
   'net_2020': [1150, 2300, 3400, 1500, 6500, 1000 ],
        'net_2021': [1216, 2619, 3701, 1890, 5630, 1099 ]
           }
df = pd.DataFrame(data)

df

In [None]:
## create a copy to test something 
dfx = df.copy()
dfx

In [None]:
## create a column called net_2022 that is a 10 percent increase from net_2021


In [None]:
### True that we could just multiply: 


### But this won't allow us to carry out more complex operations, like if multiple columns are involved.

In [None]:
## create a new copy called dfa


In [None]:
## create a random value between 1 and 1.5 to two decimal places
## uniform returns values that are not on a bell curve (normal distribution)
## but could appear anywhere between two numbers
round(random.uniform(1,1.5), 2)

In [None]:
## apply as a lambda expression on our dataframe
## increase net income in 2021 by a random number between 1 and 100 percent


## Apply Lambdas using values from specific columns

```df['new_col'] = df.apply(lambda x: x['column1'] + x['column2'], axis = 1)```


In [None]:
## call our dfa


### Use ```lambda``` to target specific columns in a dataframe

In [None]:
## Find the difference between net_2020 and net_2022


## What is ```axis = 1```?

In [None]:
## what happens to the same calculation without axis = 1?
## Find the difference between net_2021 and net_2022
## this will break!


## So what is ```axis``` in Pandas?

<img src="https://raw.githubusercontent.com/sandeepmj/22spring-advanced-data/main/img/axis.png" width="75%">

## ```axis = 0```

In [None]:
## create a df for axis = 0 
df_ax0 = df.copy()
df_ax0

```axis = 0``` would calculate the total row by row in a single column.

I rarely use ```axis = 0``` because functions like .mean(), .sum() when told what column to operate on give you axis = 0 functionality automatically as below.

But we will use ```axis = 0``` in a few weeks on more custom operations.

In [None]:
## note how the calculation is done vertically to sum all the
## items in net_2020, net_2021 


## ```axis = 1```

In [None]:
## creating another copy of the original df so we can manipulate it
df_ax1 = df.copy()
df_ax1

In [None]:
#### We want to add each value in net_2020 row with the corresponding value in the net_2021 row
## GOING ACROSS THE Columns we use axis = 1


### What is the percentage change between net_2021 and net_2022 in dfa?

We can use ```df.pct_change()```

In [None]:
## call dfa again


In [None]:
## find percent change between 2022 and 2021


### Use Lambda to calc percent change

In [None]:
## What is the percentage change between net_2020 and net_2021. Create a new column called "pct_chg"
## 


In [None]:
## without axis = 1, this will break:


## Lambda conditionals

In [None]:
## create a new copy  
dfc = df.copy()
dfc

In [None]:
## create a column called "met_goal".
## value = True if net gain of 10% or more
## value = Flase if net gain less than 10%


In [None]:
## create a column that lists the percent diff

