### <span style="color:black"><b>Pandas Tutorial 12</b></span>

<ins>Manipulating a Series & DataFrame in Pandas</ins>

<pre>
replace()
apply()
map()
applymap()
</pre>
* Manipulate a pandas series and dataframe object in a sophisticated manner using `replace()`, `apply()`, `map()` and eventually `applymap()`
* [Lambda functions](https://towardsdatascience.com/lambda-functions-with-practical-examples-in-python-45934f3653a8), a useful python concept that works really nicely here


| Method     | Series | DataFrame |
|------------|--------|-----------|
| replace()  | ✅     | ✅        |
| apply()    | ✅     | ✅        |   
| applymap() | -      | ✅        |   
| map()      | ✅     | -         |   


<u>**Using Map**</u>

In [5]:
# Unique vals

array(['Yes', 'No', nan], dtype=object)

In [6]:
# Map 'Yes' to 1 and 'No' to zeroes


In [7]:
# Change to int since that is what we were originally going for


<u>**Using Replace**</u>

In [8]:
# Get genders in our dataset

array(['Man', 'Woman'], dtype=object)

In [9]:
# Using replace


In [10]:
# Rename the column to 'Woman'


<u>**Using Apply & Lambda Functions**</u>

In [11]:
# For this one work with the NEWOnboardGood column
df.NEWOnboardGood.unique()

array(['No', 'Yes'], dtype=object)

In [12]:
# Use our lambda function

#### 🧠 **Exercise** 
* Do the same thing with the 'NEWOtherComms' series but this time use a custom function

In [13]:
# NEWOtherComms uniques

array(['No', 'Yes'], dtype=object)

In [14]:
# Function

In [15]:
# Apply function to our series

#### 🧠 **Exercise** 
* Who is the most ambitious employee?

In [16]:
def count_number_of_platforms(mystring):
    try:
        if len(mystring.strip()) == 0:
            return 0
        list_of_strings = mystring.split(';')
        return len(list_of_strings)
    except AttributeError:
        return 0

In [17]:
df['Number of platforms that they want to work with'] = df['PlatformDesireNextYear'].apply(count_number_of_platforms)

In [18]:
df_sorted = df.sort_values(by = 'Number of platforms that they want to work with', ascending = False)
df_sorted[['Number of platforms that they want to work with', 'Age']]

Unnamed: 0,Number of platforms that they want to work with,Age
1629,16,27.0
1758,16,46.0
1569,15,23.0
1166,15,35.0
2156,14,27.0
...,...,...
1142,0,54.0
1824,0,25.0
1828,0,25.0
2665,0,48.0


In [19]:
# df_sorted.plot(kind = 'scatter', 
#                x = 'Age', 
#                y ='Number of platforms that they want to work with');

#### 🧠 **Exercise**
* Take the 'OrgSize' series and convert it completely into numeric form

In [20]:
def string_to_num(mystring):
    """Takes all numbers out of the string and places them in a list, taking the average afterwards"""
    try:
        if mystring == 'Just me - I am a freelancer, sole proprietor, etc.':
            return 1
        number_list = nums_from_string.get_nums(mystring)
        avg = np.mean(number_list)
        return int(round(avg, 0))
    except (ValueError, RuntimeWarning):
        return -999

In [21]:
# Apply the function
df['OrgSize'] = df['OrgSize'].apply(string_to_num)

In [22]:
df.OrgSize

0          60
1       10000
2           6
3           6
4          14
        ...  
2892      300
2893      300
2894     7500
2895       14
2896      300
Name: OrgSize, Length: 2897, dtype: int64

#### 🧠 **Exercise**
* Adjust the 'JobSat' series to return a satisfaction score (use any method you like)

In [23]:
df.JobSat.unique()

array(['Very satisfied', 'Slightly satisfied', 'Slightly dissatisfied',
       'Neither satisfied nor dissatisfied', 'Very dissatisfied'],
      dtype=object)

In [24]:
mapping_vals = {'Very satisfied':9, 
                'Slightly satisfied': 6,
                'Very dissatisfied': 2,
                'Neither satisfied nor dissatisfied': 5,
                'Slightly dissatisfied': 4}

df['JobSat'] = df['JobSat'].map(mapping_vals)

## <span style="color:black"><u>Manipulating a Dataframe</u></span>

* Now we will go through is how to use the `apply()` method on a dataframe rather than a series
* The main difference between `apply()` as a series method vs as a dataframe method is that when we can now consider many columns in our functions
* The idea is very similar to what we did on a series but now we have to set our axis parameter to either 0 or 1, depending on whether we want to do operations row wise or column wise
* After that we will explore `applymap()` - one of my favourites

---

#### 🧠 **Exercise**

* Use `apply()` as a dataframe methopd to create a column called Bio that considers <u>multiple columns</u> such as the respondent id, country and age and salary

In [25]:
def create_bio(data):
    return f"""I am respondent {data['Respondent']} and I am a {data['Age']} 
    year old from {data['Country']} who earns {data['CompTotal']}
    """

In [26]:
df['Bio'] = df.apply(create_bio, axis = 'columns')

In [27]:
df.head(2)

Unnamed: 0,Respondent,Hobbyist,Age,CompFreq,CompTotal,Country,CurrencySymbol,EdLevel,Employment,JobSat,NEWDevOps,NEWLearn,NEWOffTopic,NEWOnboardGood,NEWOtherComms,NEWStuck,OrgSize,PlatformDesireNextYear,Woman,UndergradMajor,Number of platforms that they want to work with,Bio
0,69.0,1,25.0,Yearly,550000.0,France,EUR,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Employed full-time,9,Yes,Once a year,No,0,0,Call a coworker or friend;Visit Stack Overflow...,60,Kubernetes;Linux,0,"Computer science, computer engineering, or sof...",2,I am respondent 69.0 and I am a 25.0 \n yea...
1,80.0,1,32.0,Yearly,94500.0,United States,USD,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Employed full-time,9,Yes,Once a year,Not sure,0,0,Call a coworker or friend;Visit Stack Overflow...,10000,Docker;Kubernetes;Linux;Microsoft Azure;Windows,1,"Information systems, information technology, o...",5,I am respondent 80.0 and I am a 32.0 \n yea...


#### 🧠 **IDEA**
Turn the CompTotal series into Aussie dollars. [Here](https://medium.com/analytics-vidhya/convert-currencies-automatically-with-python-python-in-audit-2-6c574dbae44) is a program that basically does exctly this using apply!

Anyway, onto applymap...

---

<u>applymap()</u>

* `applymap()` in pandas is very similar to everything that we have just covered and is best used when we wish to adjust every individual entry in our dataframe 
* No axis argument is needed here as we are now just sweeping through each individual element in the dataframe (without the need for a `for` loop may I add)
* This is nice if we would like to change each and every integer value into a float, or get every string in uppercase over our entire dataset

---

This is best done through an example:

* Suppose that we were concerned that there was a row 'france' but also 'France ', 'FranCE' and 'fRanCe'
* Suppose that in the CurrencySymbol series we were also concerned that there was a value 'EUR' but also 'eur   ' even though they are both supposed to represent the same thing
* Use `applymap()` to solve this problem by getting everything into lower case with stripped whitespace so that there are no case issues and no whitespace issues
* This is easily in my top 10 favourite lines of pandas code
* My favourite line of pandas code will be in the video about `pd.concat()` 👀


In [28]:
# "for each entry, put it into lowercase and strip whitespace if it is a string, otherwise just leave it as it is"
df = df.applymap(lambda x: x.lower().strip() if type(x) == str else x)

In [29]:
# A silly example - Take each numeric element and square it for no reason whatsoever
# Of course we will not be saving this to a new dataframe
df.applymap(lambda x: x**2 if type(x) in [int, float] else x)

Unnamed: 0,Respondent,Hobbyist,Age,CompFreq,CompTotal,Country,CurrencySymbol,EdLevel,Employment,JobSat,NEWDevOps,NEWLearn,NEWOffTopic,NEWOnboardGood,NEWOtherComms,NEWStuck,OrgSize,PlatformDesireNextYear,Woman,UndergradMajor,Number of platforms that they want to work with,Bio
0,4.761000e+03,1,625.0,yearly,3.025000e+11,france,eur,"master’s degree (m.a., m.s., m.eng., mba, etc.)",employed full-time,81,yes,once a year,no,0,0,call a coworker or friend;visit stack overflow...,3600,kubernetes;linux,0,"computer science, computer engineering, or sof...",4,i am respondent 69.0 and i am a 25.0 \n yea...
1,6.400000e+03,1,1024.0,yearly,8.930250e+09,united states,usd,"bachelor’s degree (b.a., b.s., b.eng., etc.)",employed full-time,81,yes,once a year,not sure,0,0,call a coworker or friend;visit stack overflow...,100000000,docker;kubernetes;linux;microsoft azure;windows,1,"information systems, information technology, o...",25,i am respondent 80.0 and i am a 32.0 \n yea...
2,1.416100e+04,1,1225.0,yearly,2.102500e+10,united states,usd,"bachelor’s degree (b.a., b.s., b.eng., etc.)",employed full-time,81,yes,,no,1,1,meditate;call a coworker or friend;visit stack...,36,docker;kubernetes;linux;macos;microsoft azure;...,1,"another engineering discipline (such as civil,...",36,i am respondent 119.0 and i am a 35.0 \n ye...
3,1.638400e+04,1,529.0,yearly,6.400000e+09,united states,usd,some college/university study without earning ...,employed full-time,36,no,once a year,not sure,0,0,play games;visit stack overflow;go for a walk ...,36,android;docker,1,"computer science, computer engineering, or sof...",4,i am respondent 128.0 and i am a 23.0 \n ye...
4,1.716100e+04,1,1369.0,yearly,1.322500e+10,united states,usd,"bachelor’s degree (b.a., b.s., b.eng., etc.)",employed full-time,36,no,every few months,no,1,1,visit stack overflow;go for a walk or other ph...,196,arduino;aws;docker;raspberry pi;windows,1,mathematics or statistics,25,i am respondent 131.0 and i am a 37.0 \n ye...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2892,3.951757e+09,1,900.0,yearly,5.290000e+10,united states,usd,"bachelor’s degree (b.a., b.s., b.eng., etc.)",employed full-time,36,yes,every few months,not sure,0,0,play games;visit stack overflow;go for a walk ...,90000,docker;slack apps and integrations,0,"information systems, information technology, o...",4,i am respondent 62863.0 and i am a 30.0 \n ...
2893,3.954649e+09,1,1024.0,yearly,1.054729e+10,united states,usd,"master’s degree (m.a., m.s., m.eng., mba, etc.)",employed full-time,36,no,once every few years,yes,1,0,call a coworker or friend;visit stack overflow...,90000,docker;microsoft azure;windows,1,"information systems, information technology, o...",9,i am respondent 62886.0 and i am a 32.0 \n ...
2894,3.956913e+09,1,1089.0,yearly,9.025000e+09,united states,usd,"associate degree (a.a., a.s., etc.)",employed full-time,36,yes,every few months,not sure,1,1,call a coworker or friend;visit stack overflow...,56250000,aws;docker;heroku;kubernetes;linux;macos;raspb...,1,"computer science, computer engineering, or sof...",49,i am respondent 62904.0 and i am a 33.0 \n ...
2895,4.005371e+09,0,961.0,yearly,4.225000e+09,united kingdom,gbp,"bachelor’s degree (b.a., b.s., b.eng., etc.)",employed full-time,81,yes,every few months,not sure,0,0,visit stack overflow;watch help / tutorial vid...,196,kubernetes,0,"computer science, computer engineering, or sof...",1,i am respondent 63288.0 and i am a 31.0 \n ...
