# Pandas - Grouping and aggregating - Analysing and exploring your data

## Table of contents

* [Aggregation](#Aggregation)
    * [Statistical aggregation: for instance, `Series.median()`, `Series.describe()`, `DataFrame.median()` and `DataFrame.describe()`](#Statistical-aggregation:-for-instance,-Series.median(),-Series.describe(),-DataFrame.median()-and-DataFrame.describe())
    * [Difference between `.count()`, `.value_counts()` and `.unique()` methods](#Difference-between-.count(),-.value_counts()-and-.unique()-methods)
* [Grouping](#Grouping)
    * [`DataFrameGroupBy` object returned from `DataFrame.groupby(<column>)`](#DataFrameGroupBy-object-returned-from-DataFrame.groupby(<column>))
    * [`DataFrame` object, for a group, returned from `DataFrameGroupBy.get_group(<group_name>)`](#DataFrame-object,-for-a-group,-returned-from-DataFrameGroupBy.get_group(<group_name>))
    * [`SeriesGroupBy` object returned from `DataFrameGroupBy[<column>]`](#SeriesGroupBy-object-returned-from-DataFrameGroupBy[<column>])
    * [`Series` object, for a group, returned from `SeriesGroupBy.get_group(<group_name>)`](#Series-object,-for-a-group,-returned-from-SeriesGroupBy.get_group(<group_name>))
    * [Most used social media sites in India: `DataFrameGroubBy['SocialMedia'].get_group('India').value_counts()` or `DataFrameGroubBy['SocialMedia'].value_counts().loc['India']`, where `DataFrameGroupBy` is `DataFrame.groupby('Country')`](#Most-used-social-media-sites-in-India:-DataFrameGroubBy['SocialMedia'].get_group('India').value_counts()-or-DataFrameGroubBy['SocialMedia'].value_counts().loc['India'],-where-DataFrameGroupBy-is-DataFrame.groupby('Country'))
    * [`.groupby()` method vs filtering](#.groupby()-method-vs-filtering)
    * [Summary of grouping](#Summary-of-grouping)
* [Aggregating results after grouping](#Aggregating-results-after-grouping)
    * [`DataFrame.agg()` and `Series.agg()` methods](#DataFrame.agg()-and-Series.agg()-methods)
    * [`DataFrameGroupBy.agg()` and `SeriesGroupBy.agg()` methods](#DataFrameGroupBy.agg()-and-SeriesGroupBy.agg()-methods)
* [Challenge](#Challenge)
    * [Task](#Task)
    * [Solution](#Solution)

***

## Aggregation

In [14]:
import pandas as pd

In [15]:
df = pd.read_csv('work_directory/pandas/data/survey_results_public.csv', index_col='Respondent')
df_schema = pd.read_csv('work_directory/pandas/data/survey_results_schema.csv', index_col='Column')

In [16]:
pd.set_option('display.max_rows', 85)
pd.set_option('display.max_columns', 85)

In [17]:
df.head()

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,OrgSize,DevType,YearsCode,Age1stCode,YearsCodePro,CareerSat,JobSat,MgrIdiot,MgrMoney,MgrWant,JobSeek,LastHireDate,LastInt,FizzBuzz,JobFactors,ResumeUpdate,CurrencySymbol,CurrencyDesc,CompTotal,CompFreq,ConvertedComp,WorkWeekHrs,WorkPlan,WorkChallenge,WorkRemote,WorkLoc,ImpSyn,CodeRev,CodeRevHrs,UnitTests,PurchaseHow,PurchaseWhat,LanguageWorkedWith,LanguageDesireNextYear,DatabaseWorkedWith,DatabaseDesireNextYear,PlatformWorkedWith,PlatformDesireNextYear,WebFrameWorkedWith,WebFrameDesireNextYear,MiscTechWorkedWith,MiscTechDesireNextYear,DevEnviron,OpSys,Containers,BlockchainOrg,BlockchainIs,BetterLife,ITperson,OffOn,SocialMedia,Extraversion,ScreenName,SOVisit1st,SOVisitFreq,SOVisitTo,SOFindAnswer,SOTimeSaved,SOHowMuchTime,SOAccount,SOPartFreq,SOJobs,EntTeams,SOComm,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1
1,I am a student who is learning to code,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work",United Kingdom,No,Primary/elementary school,,"Taught yourself a new language, framework, or ...",,,4.0,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,HTML/CSS;Java;JavaScript;Python,C;C++;C#;Go;HTML/CSS;Java;JavaScript;Python;SQL,SQLite,MySQL,MacOS;Windows,Android;Arduino;Windows,Django;Flask,Flask;jQuery,Node.js,Node.js,IntelliJ;Notepad++;PyCharm,Windows,I do not use containers,,,Yes,"Fortunately, someone else has that title",Yes,Twitter,Online,Username,2017,A few times per month or weekly,Find answers to specific questions;Learn how t...,3-5 times per week,Stack Overflow was much faster,31-60 minutes,No,,"No, I didn't know that Stack Overflow had a jo...","No, and I don't know what those are",Neutral,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,14.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
2,I am a student who is learning to code,No,Less than once per year,The quality of OSS and closed source software ...,"Not employed, but looking for work",Bosnia and Herzegovina,"Yes, full-time","Secondary school (e.g. American high school, G...",,Taken an online course in programming or softw...,,"Developer, desktop or enterprise applications;...",,17,,,,,,,I am actively looking for a job,I've never had a job,,,Financial performance or funding status of the...,"Something else changed (education, award, medi...",,,,,,,,,,,,,,,,,C++;HTML/CSS;Python,C++;HTML/CSS;JavaScript;SQL,,MySQL,Windows,Windows,Django,Django,,,Atom;PyCharm,Windows,I do not use containers,,Useful across many domains and could change ma...,Yes,Yes,Yes,Instagram,Online,Username,2017,Daily or almost daily,Find answers to specific questions;Learn how t...,3-5 times per week,Stack Overflow was much faster,11-30 minutes,Yes,A few times per month or weekly,"No, I knew that Stack Overflow had a job board...","No, and I don't know what those are","Yes, somewhat",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,19.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
3,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Thailand,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Web development or web design,"Taught yourself a new language, framework, or ...",100 to 499 employees,"Designer;Developer, back-end;Developer, front-...",3.0,22,1,Slightly satisfied,Slightly satisfied,Not at all confident,Not sure,Not sure,"I’m not actively looking, but I am open to new...",1-2 years ago,Interview with people in peer roles,No,"Languages, frameworks, and other technologies ...",I was preparing for a job search,THB,Thai baht,23000.0,Monthly,8820.0,40.0,There's no schedule or spec; I work on what se...,Distracting work environment;Inadequate access...,Less than once per month / Never,Home,Average,No,,"No, but I think we should",Not sure,I have little or no influence,HTML/CSS,Elixir;HTML/CSS,PostgreSQL,PostgreSQL,,,,Other(s):,,,Vim;Visual Studio Code,Linux-based,I do not use containers,,,Yes,Yes,Yes,Reddit,In real life (in person),Username,2011,A few times per week,Find answers to specific questions;Learn how t...,6-10 times per week,They were about the same,,Yes,Less than once per month or monthly,Yes,"No, I've heard of them, but I am not part of a...",Neutral,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,28.0,Man,No,Straight / Heterosexual,,Yes,Appropriate in length,Neither easy nor difficult
4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,100 to 499 employees,"Developer, full-stack",3.0,16,Less than 1 year,Very satisfied,Slightly satisfied,Very confident,No,Not sure,I am not interested in new job opportunities,Less than a year ago,"Write code by hand (e.g., on a whiteboard);Int...",No,"Languages, frameworks, and other technologies ...",I was preparing for a job search,USD,United States dollar,61000.0,Yearly,61000.0,80.0,There's no schedule or spec; I work on what se...,,Less than once per month / Never,Home,A little below average,No,,"No, but I think we should",Developers typically have the most influence o...,I have little or no influence,C;C++;C#;Python;SQL,C;C#;JavaScript;SQL,MySQL;SQLite,MySQL;SQLite,Linux;Windows,Linux;Windows,,,.NET,.NET,Eclipse;Vim;Visual Studio;Visual Studio Code,Windows,I do not use containers,Not at all,"Useful for decentralized currency (i.e., Bitcoin)",Yes,SIGH,Yes,Reddit,In real life (in person),Username,2014,Daily or almost daily,Find answers to specific questions;Pass the ti...,1-2 times per week,Stack Overflow was much faster,31-60 minutes,Yes,Less than once per month or monthly,Yes,"No, and I don't know what those are","No, not really",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
5,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Ukraine,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,"10,000 or more employees","Academic researcher;Developer, desktop or ente...",16.0,14,9,Very dissatisfied,Slightly dissatisfied,Somewhat confident,Yes,No,I am not interested in new job opportunities,Less than a year ago,"Write any code;Write code by hand (e.g., on a ...",No,"Industry that I'd be working in;Languages, fra...",I was preparing for a job search,UAH,Ukrainian hryvnia,,,,55.0,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Inadequ...,A few days each month,Office,A little above average,"Yes, because I see value in code review",,"Yes, it's part of our process",Not sure,I have little or no influence,C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA,HTML/CSS;Java;JavaScript;SQL;WebAssembly,Couchbase;MongoDB;MySQL;Oracle;PostgreSQL;SQLite,Couchbase;Firebase;MongoDB;MySQL;Oracle;Postgr...,Android;Linux;MacOS;Slack;Windows,Android;Docker;Kubernetes;Linux;Slack,Django;Express;Flask;jQuery;React.js;Spring,Flask;jQuery;React.js;Spring,Cordova;Node.js,Apache Spark;Hadoop;Node.js;React Native,IntelliJ;Notepad++;Vim,Linux-based,"Outside of work, for personal projects",Not at all,,Yes,Also Yes,Yes,Facebook,In real life (in person),Username,I don't remember,Multiple times per day,Find answers to specific questions,More than 10 times per week,Stack Overflow was much faster,,Yes,A few times per month or weekly,"No, I knew that Stack Overflow had a job board...","No, I've heard of them, but I am not part of a...","Yes, definitely",Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,30.0,Man,No,Straight / Heterosexual,White or of European descent;Multiracial,No,Appropriate in length,Easy


In [18]:
df['ConvertedComp'].head(15)

Respondent
1          NaN
2          NaN
3       8820.0
4      61000.0
5          NaN
6     366420.0
7          NaN
8          NaN
9      95179.0
10     13293.0
11         NaN
12         NaN
13     90000.0
14     57060.0
15         NaN
Name: ConvertedComp, dtype: float64

### Statistical aggregation: for instance, `Series.median()`, `Series.describe()`, `DataFrame.median()` and `DataFrame.describe()`

To get the **median of a column** (ignoring the `NaN` values):

In [19]:
df['ConvertedComp'].median()

57287.0

To get the **statistical summary of a column** (ignoring the `NaN` values):

In [20]:
df['ConvertedComp'].describe()

count    2.505900e+04
mean     1.265119e+05
std      2.829415e+05
min      0.000000e+00
25%      2.566800e+04
50%      5.728700e+04
75%      1.000000e+05
max      2.000000e+06
Name: ConvertedComp, dtype: float64

To get the **median for all the *numeric* columns** in a `DataFrame` (ignoring the `NaN` values):

In [21]:
df.median()

CompTotal        62000.0
ConvertedComp    57287.0
WorkWeekHrs         40.0
CodeRevHrs           4.0
Age                 29.0
dtype: float64

To get a **statistical summary for all the *numeric* columns** in a `DataFrame` (ignoring the `NaN` values):

In [22]:
df.describe()

Unnamed: 0,CompTotal,ConvertedComp,WorkWeekHrs,CodeRevHrs,Age
count,25116.0,25059.0,29009.0,22366.0,35513.0
mean,1194709000000.0,126511.9,42.245933,5.131483,30.346116
std,109286900000000.0,282941.5,40.323693,5.689917,9.148612
min,0.0,0.0,1.0,0.0,1.0
25%,20000.0,25668.0,40.0,2.0,24.0
50%,62000.0,57287.0,40.0,4.0,29.0
75%,120000.0,100000.0,44.0,6.0,35.0
max,1e+16,2000000.0,4125.0,99.0,99.0


### Difference between `.count()`, `.value_counts()` and `.unique()` methods

`.count()` returns the *number of non-missing values* in a `Series` (or `DataFrame`).

`.value_counts()` returns a *tally of unique values* in a `Series`.

`.unique()` returns *all the unique values without their counts* in a `Series`

In [23]:
df['Hobbyist'].count()

39755

In [24]:
df['Hobbyist'].value_counts()

Yes    31930
No      7825
Name: Hobbyist, dtype: int64

In [25]:
df['Hobbyist'].unique()

array(['Yes', 'No'], dtype=object)

Note: 
- `.count()` can be called on BOTH, `Series` and `DataFrame` objects. 
- `.value_counts()` can be called ONLY on `Series` objects.
- `.unique()` can be called ONLY on `Series` objects. 

Now, if we want to see what is the **most used social media site**:

In [26]:
df['SocialMedia']

Respondent
1          Twitter
2        Instagram
3           Reddit
4           Reddit
5         Facebook
           ...    
39996      Twitter
39997      YouTube
39998     LinkedIn
39999     WhatsApp
40000     WhatsApp
Name: SocialMedia, Length: 39755, dtype: object

In [27]:
df_schema.loc['SocialMedia']

QuestionText    What social media site do you use the most?
Name: SocialMedia, dtype: object

In [28]:
df['SocialMedia'].value_counts()

Reddit                      6582
YouTube                     6123
WhatsApp                    6009
Facebook                    5907
Twitter                     5155
Instagram                   2783
I don't use social media    2500
LinkedIn                    1967
WeChat 微信                    290
Snapchat                     284
VK ВКонта́кте                257
Weibo 新浪微博                    25
Youku Tudou 优酷                12
Hello                         12
Name: SocialMedia, dtype: int64

So, *'Reddit'* seems to be the most used social media site.

If, instead of looking at the actual number of responses for each social media site, we want to look at the % of total responses for each social media site, we can just set the argument `normalize=True`:

In [29]:
df['SocialMedia'].value_counts(normalize=True)

Reddit                      0.173640
YouTube                     0.161531
WhatsApp                    0.158524
Facebook                    0.155833
Twitter                     0.135994
Instagram                   0.073418
I don't use social media    0.065953
LinkedIn                    0.051892
WeChat 微信                   0.007651
Snapchat                    0.007492
VK ВКонта́кте               0.006780
Weibo 新浪微博                  0.000660
Youku Tudou 优酷              0.000317
Hello                       0.000317
Name: SocialMedia, dtype: float64

So about *17%* of the respondents have chosen *'Reddit'* as their most used social media site.

## Grouping

In [30]:
df.head()

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,OrgSize,DevType,YearsCode,Age1stCode,YearsCodePro,CareerSat,JobSat,MgrIdiot,MgrMoney,MgrWant,JobSeek,LastHireDate,LastInt,FizzBuzz,JobFactors,ResumeUpdate,CurrencySymbol,CurrencyDesc,CompTotal,CompFreq,ConvertedComp,WorkWeekHrs,WorkPlan,WorkChallenge,WorkRemote,WorkLoc,ImpSyn,CodeRev,CodeRevHrs,UnitTests,PurchaseHow,PurchaseWhat,LanguageWorkedWith,LanguageDesireNextYear,DatabaseWorkedWith,DatabaseDesireNextYear,PlatformWorkedWith,PlatformDesireNextYear,WebFrameWorkedWith,WebFrameDesireNextYear,MiscTechWorkedWith,MiscTechDesireNextYear,DevEnviron,OpSys,Containers,BlockchainOrg,BlockchainIs,BetterLife,ITperson,OffOn,SocialMedia,Extraversion,ScreenName,SOVisit1st,SOVisitFreq,SOVisitTo,SOFindAnswer,SOTimeSaved,SOHowMuchTime,SOAccount,SOPartFreq,SOJobs,EntTeams,SOComm,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1
1,I am a student who is learning to code,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work",United Kingdom,No,Primary/elementary school,,"Taught yourself a new language, framework, or ...",,,4.0,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,HTML/CSS;Java;JavaScript;Python,C;C++;C#;Go;HTML/CSS;Java;JavaScript;Python;SQL,SQLite,MySQL,MacOS;Windows,Android;Arduino;Windows,Django;Flask,Flask;jQuery,Node.js,Node.js,IntelliJ;Notepad++;PyCharm,Windows,I do not use containers,,,Yes,"Fortunately, someone else has that title",Yes,Twitter,Online,Username,2017,A few times per month or weekly,Find answers to specific questions;Learn how t...,3-5 times per week,Stack Overflow was much faster,31-60 minutes,No,,"No, I didn't know that Stack Overflow had a jo...","No, and I don't know what those are",Neutral,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,14.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
2,I am a student who is learning to code,No,Less than once per year,The quality of OSS and closed source software ...,"Not employed, but looking for work",Bosnia and Herzegovina,"Yes, full-time","Secondary school (e.g. American high school, G...",,Taken an online course in programming or softw...,,"Developer, desktop or enterprise applications;...",,17,,,,,,,I am actively looking for a job,I've never had a job,,,Financial performance or funding status of the...,"Something else changed (education, award, medi...",,,,,,,,,,,,,,,,,C++;HTML/CSS;Python,C++;HTML/CSS;JavaScript;SQL,,MySQL,Windows,Windows,Django,Django,,,Atom;PyCharm,Windows,I do not use containers,,Useful across many domains and could change ma...,Yes,Yes,Yes,Instagram,Online,Username,2017,Daily or almost daily,Find answers to specific questions;Learn how t...,3-5 times per week,Stack Overflow was much faster,11-30 minutes,Yes,A few times per month or weekly,"No, I knew that Stack Overflow had a job board...","No, and I don't know what those are","Yes, somewhat",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,19.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
3,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Thailand,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Web development or web design,"Taught yourself a new language, framework, or ...",100 to 499 employees,"Designer;Developer, back-end;Developer, front-...",3.0,22,1,Slightly satisfied,Slightly satisfied,Not at all confident,Not sure,Not sure,"I’m not actively looking, but I am open to new...",1-2 years ago,Interview with people in peer roles,No,"Languages, frameworks, and other technologies ...",I was preparing for a job search,THB,Thai baht,23000.0,Monthly,8820.0,40.0,There's no schedule or spec; I work on what se...,Distracting work environment;Inadequate access...,Less than once per month / Never,Home,Average,No,,"No, but I think we should",Not sure,I have little or no influence,HTML/CSS,Elixir;HTML/CSS,PostgreSQL,PostgreSQL,,,,Other(s):,,,Vim;Visual Studio Code,Linux-based,I do not use containers,,,Yes,Yes,Yes,Reddit,In real life (in person),Username,2011,A few times per week,Find answers to specific questions;Learn how t...,6-10 times per week,They were about the same,,Yes,Less than once per month or monthly,Yes,"No, I've heard of them, but I am not part of a...",Neutral,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,28.0,Man,No,Straight / Heterosexual,,Yes,Appropriate in length,Neither easy nor difficult
4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,100 to 499 employees,"Developer, full-stack",3.0,16,Less than 1 year,Very satisfied,Slightly satisfied,Very confident,No,Not sure,I am not interested in new job opportunities,Less than a year ago,"Write code by hand (e.g., on a whiteboard);Int...",No,"Languages, frameworks, and other technologies ...",I was preparing for a job search,USD,United States dollar,61000.0,Yearly,61000.0,80.0,There's no schedule or spec; I work on what se...,,Less than once per month / Never,Home,A little below average,No,,"No, but I think we should",Developers typically have the most influence o...,I have little or no influence,C;C++;C#;Python;SQL,C;C#;JavaScript;SQL,MySQL;SQLite,MySQL;SQLite,Linux;Windows,Linux;Windows,,,.NET,.NET,Eclipse;Vim;Visual Studio;Visual Studio Code,Windows,I do not use containers,Not at all,"Useful for decentralized currency (i.e., Bitcoin)",Yes,SIGH,Yes,Reddit,In real life (in person),Username,2014,Daily or almost daily,Find answers to specific questions;Pass the ti...,1-2 times per week,Stack Overflow was much faster,31-60 minutes,Yes,Less than once per month or monthly,Yes,"No, and I don't know what those are","No, not really",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
5,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Ukraine,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,"10,000 or more employees","Academic researcher;Developer, desktop or ente...",16.0,14,9,Very dissatisfied,Slightly dissatisfied,Somewhat confident,Yes,No,I am not interested in new job opportunities,Less than a year ago,"Write any code;Write code by hand (e.g., on a ...",No,"Industry that I'd be working in;Languages, fra...",I was preparing for a job search,UAH,Ukrainian hryvnia,,,,55.0,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Inadequ...,A few days each month,Office,A little above average,"Yes, because I see value in code review",,"Yes, it's part of our process",Not sure,I have little or no influence,C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA,HTML/CSS;Java;JavaScript;SQL;WebAssembly,Couchbase;MongoDB;MySQL;Oracle;PostgreSQL;SQLite,Couchbase;Firebase;MongoDB;MySQL;Oracle;Postgr...,Android;Linux;MacOS;Slack;Windows,Android;Docker;Kubernetes;Linux;Slack,Django;Express;Flask;jQuery;React.js;Spring,Flask;jQuery;React.js;Spring,Cordova;Node.js,Apache Spark;Hadoop;Node.js;React Native,IntelliJ;Notepad++;Vim,Linux-based,"Outside of work, for personal projects",Not at all,,Yes,Also Yes,Yes,Facebook,In real life (in person),Username,I don't remember,Multiple times per day,Find answers to specific questions,More than 10 times per week,Stack Overflow was much faster,,Yes,A few times per month or weekly,"No, I knew that Stack Overflow had a job board...","No, I've heard of them, but I am not part of a...","Yes, definitely",Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,30.0,Man,No,Straight / Heterosexual,White or of European descent;Multiracial,No,Appropriate in length,Easy


In [31]:
df['Country']

Respondent
1                United Kingdom
2        Bosnia and Herzegovina
3                      Thailand
4                 United States
5                       Ukraine
                  ...          
39996                   Hungary
39997                    Brazil
39998                   Germany
39999                   Germany
40000               Afghanistan
Name: Country, Length: 39755, dtype: object

In [32]:
df['Country'].value_counts()

United States        9402
India                4072
Germany              2610
United Kingdom       2549
Canada               1536
                     ... 
Niger                   1
Brunei Darussalam       1
Sierra Leone            1
Gabon                   1
Bahamas                 1
Name: Country, Length: 173, dtype: int64

In [33]:
type(df['Country'].value_counts())

pandas.core.series.Series

### `DataFrameGroupBy` object returned from `DataFrame.groupby(<column>)`

`DataFrame.groupby()` method returns a `DataFrameGroupBy` object that contains information about the *groups*.

In [34]:
df.groupby('Country')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000017D6EE2FEC8>

In [35]:
type(df.groupby('Country'))

pandas.core.groupby.generic.DataFrameGroupBy

In [36]:
country_grp = df.groupby('Country')

### `DataFrame` object, for a group, returned from `DataFrameGroupBy.get_group(<group_name>)`

Now that we have created country groups, if we want to view a `DataFrame` with results from *'India' only*, we can use the `DataFrameGroupBy.get_group()` method.

`DataFrameGroupBy.get_group()` method returns a `DataFrame` object.

In [37]:
country_grp.get_group('India')

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,OrgSize,DevType,YearsCode,Age1stCode,YearsCodePro,CareerSat,JobSat,MgrIdiot,MgrMoney,MgrWant,JobSeek,LastHireDate,LastInt,FizzBuzz,JobFactors,ResumeUpdate,CurrencySymbol,CurrencyDesc,CompTotal,CompFreq,ConvertedComp,WorkWeekHrs,WorkPlan,WorkChallenge,WorkRemote,WorkLoc,ImpSyn,CodeRev,CodeRevHrs,UnitTests,PurchaseHow,PurchaseWhat,LanguageWorkedWith,LanguageDesireNextYear,DatabaseWorkedWith,DatabaseDesireNextYear,PlatformWorkedWith,PlatformDesireNextYear,WebFrameWorkedWith,WebFrameDesireNextYear,MiscTechWorkedWith,MiscTechDesireNextYear,DevEnviron,OpSys,Containers,BlockchainOrg,BlockchainIs,BetterLife,ITperson,OffOn,SocialMedia,Extraversion,ScreenName,SOVisit1st,SOVisitFreq,SOVisitTo,SOFindAnswer,SOTimeSaved,SOHowMuchTime,SOAccount,SOPartFreq,SOJobs,EntTeams,SOComm,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1
8,I code primarily as a hobby,Yes,Less than once per year,"OSS is, on average, of HIGHER quality than pro...","Not employed, but looking for work",India,,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...","Taught yourself a new language, framework, or ...",,"Developer, back-end;Engineer, site reliability",8,16,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bash/Shell/PowerShell;C;C++;HTML/CSS;Java;Java...,Bash/Shell/PowerShell;C;C++;Elixir;Erlang;Go;P...,Cassandra;Elasticsearch;MongoDB;MySQL;Oracle;R...,Cassandra;DynamoDB;Elasticsearch;Firebase;Mong...,AWS;Docker;Heroku;Linux;MacOS;Slack,Android;Arduino;AWS;Docker;Google Cloud Platfo...,Express;Flask;React.js;Spring,Django;Express;Flask;React.js;Vue.js,Hadoop;Node.js;Pandas,Ansible;Apache Spark;Chef;Hadoop;Node.js;Panda...,Atom;IntelliJ;IPython / Jupyter;PyCharm;Visual...,Linux-based,Development;Testing;Production;Outside of work...,,Useful across many domains and could change ma...,Yes,SIGH,Yes,YouTube,In real life (in person),Handle,2012,A few times per week,Find answers to specific questions;Learn how t...,Less than once per week,Stack Overflow was slightly faster,11-30 minutes,Yes,Less than once per month or monthly,Yes,"No, and I don't know what those are","Yes, definitely",A lot more welcome now than last year,Tech articles written by other developers;Indu...,24.0,Man,No,Straight / Heterosexual,,,Appropriate in length,Neither easy nor difficult
10,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,India,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)",,,"10,000 or more employees",Data or business analyst;Data scientist or mac...,12,20,10,Slightly dissatisfied,Slightly dissatisfied,Somewhat confident,Yes,Yes,"I’m not actively looking, but I am open to new...",3-4 years ago,,No,"Languages, frameworks, and other technologies ...",,INR,Indian rupee,950000.0,Yearly,13293.0,70.0,There's no schedule or spec; I work on what se...,,A few days each month,Home,Far above average,"Yes, because I see value in code review",4.0,"Yes, it's part of our process",,,C#;Go;JavaScript;Python;R;SQL,C#;Go;JavaScript;Kotlin;Python;R;SQL,Elasticsearch;MongoDB;Microsoft SQL Server;MyS...,Elasticsearch;MongoDB;Microsoft SQL Server,Linux;Windows,Android;Linux;Raspberry Pi;Windows,Angular/Angular.js;ASP.NET;Django;Express;Flas...,Angular/Angular.js;ASP.NET;Django;Express;Flas...,.NET;Node.js;Pandas;Torch/PyTorch,.NET;Node.js;TensorFlow;Torch/PyTorch,Android Studio;Eclipse;IPython / Jupyter;Notep...,Windows,,Not at all,Useful for immutable record keeping outside of...,No,Yes,Yes,YouTube,Neither,Screen Name,,Multiple times per day,Find answers to specific questions;Get a sense...,3-5 times per week,They were about the same,,Yes,A few times per month or weekly,Yes,"No, and I don't know what those are","Yes, somewhat",Somewhat less welcome now than last year,Tech articles written by other developers;Tech...,,,,,,Yes,Too long,Difficult
15,I am a student who is learning to code,Yes,Never,"OSS is, on average, of HIGHER quality than pro...","Not employed, but looking for work",India,"Yes, full-time","Secondary school (e.g. American high school, G...",,Taken an online course in programming or softw...,,Student,3,13,,,,,,,"I’m not actively looking, but I am open to new...",I've never had a job,,,"Industry that I'd be working in;Languages, fra...","Something else changed (education, award, medi...",,,,,,,,,,,,,,,,,Assembly;Bash/Shell/PowerShell;C;C++;HTML/CSS;...,Assembly;Bash/Shell/PowerShell;C;C++;C#;Go;HTM...,MariaDB;MySQL;Oracle;SQLite,MariaDB;MongoDB;Microsoft SQL Server;MySQL;Ora...,Linux;Windows,Android;Google Cloud Platform;iOS;Linux;MacOS;...,,Angular/Angular.js;ASP.NET;Django;Drupal;jQuer...,,.NET;.NET Core;Node.js;TensorFlow;Unity 3D;Unr...,Atom;NetBeans;Notepad++;Sublime Text;Vim,Linux-based,Development,,,Yes,Yes,What?,YouTube,In real life (in person),,2018,Daily or almost daily,Find answers to specific questions;Learn how t...,More than 10 times per week,They were about the same,,Yes,Less than once per month or monthly,Yes,"No, I've heard of them, but I am not part of a...","Yes, somewhat",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,20.0,Man,No,,,Yes,Too long,Neither easy nor difficult
50,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of LOWER quality than prop...",Employed full-time,India,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Another engineering discipline (ex. civil, ele...",Received on-the-job training in software devel...,"10,000 or more employees","Developer, back-end;DevOps specialist",7,15,2,Slightly satisfied,Very satisfied,Very confident,Not sure,Yes,"I’m not actively looking, but I am open to new...",1-2 years ago,"Write code by hand (e.g., on a whiteboard);Int...",No,Specific department or team I'd be working on;...,I was preparing for a job search,INR,Indian rupee,400000.0,Yearly,5597.0,7.0,There is a schedule and/or spec (made by me or...,Meetings;Time spent commuting,Less than once per month / Never,"Other place, such as a coworking space or cafe",Average,No,,"Yes, it's not part of our process but the deve...","The CTO, CIO, or other management purchase new...",I have little or no influence,Bash/Shell/PowerShell;C;C++;HTML/CSS;Java;Java...,HTML/CSS;JavaScript;Python,Elasticsearch;Firebase;MariaDB;MongoDB;MySQL;O...,Firebase;PostgreSQL;Redis;Other(s):,Arduino;AWS;Heroku;Linux;MacOS;Raspberry Pi;Wo...,AWS;Docker;Heroku;Kubernetes;Linux;MacOS;WordP...,Django;Express;Flask;jQuery,Express;Flask;jQuery;React.js;Vue.js,Node.js,Node.js,Notepad++;Visual Studio Code,MacOS,Testing,Not at all,Useful for immutable record keeping outside of...,Yes,Also Yes,What?,YouTube,In real life (in person),Username,2012,Daily or almost daily,Find answers to specific questions;Learn how t...,3-5 times per week,Stack Overflow was slightly faster,11-30 minutes,Yes,Less than once per month or monthly,"No, I knew that Stack Overflow had a job board...","No, and I don't know what those are","Yes, definitely",Just as welcome now as I felt last year,Tech articles written by other developers;Tech...,23.0,Man,No,,South Asian,No,Too long,Easy
65,I am a developer by profession,Yes,Never,,Employed full-time,India,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Information systems, information technology, o...",,20 to 99 employees,"Developer, front-end;Developer, mobile",2,17,2,Very satisfied,Very satisfied,Very confident,No,Not sure,"I’m not actively looking, but I am open to new...",Less than a year ago,Write any code;Solve a brain-teaser style puzz...,No,"Languages, frameworks, and other technologies ...","My job status changed (promotion, new job, etc.)",INR,Indian rupee,,Monthly,,48.0,There's no schedule or spec; I work on what se...,,About half the time,Office,Average,"Yes, because I see value in code review",,"Yes, it's not part of our process but the deve...",Not sure,,Assembly;C;C++;C#;HTML/CSS;Java,Kotlin,Firebase;MySQL;Oracle;SQLite,Firebase;SQLite,Android,Android,ASP.NET,,,,Android Studio;IntelliJ,Linux-based,,,,Yes,Yes,What?,WhatsApp,In real life (in person),,2017,Multiple times per day,Find answers to specific questions,More than 10 times per week,Stack Overflow was slightly faster,11-30 minutes,Yes,A few times per week,"No, I knew that Stack Overflow had a job board...","No, and I don't know what those are",Not sure,A lot more welcome now than last year,,21.0,Man,No,,,Yes,Appropriate in length,Neither easy nor difficult
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39965,I am a developer by profession,No,Never,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,India,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...",Taken a part-time in-person course in programm...,20 to 99 employees,"Developer, back-end;Developer, front-end;Devel...",5,20,2,Neither satisfied nor dissatisfied,Slightly dissatisfied,Somewhat confident,Not sure,Yes,"I’m not actively looking, but I am open to new...",3-4 years ago,Write any code;Solve a brain-teaser style puzzle,No,"Languages, frameworks, and other technologies ...",I was preparing for a job search,INR,Indian rupee,600000.0,Yearly,8396.0,40.0,There's no schedule or spec; I work on what se...,Lack of support from management;Meetings,Less than once per month / Never,Home,A little below average,No,,"No, but I think we should",Not sure,I have a great deal of influence,HTML/CSS;Java;JavaScript;TypeScript,Dart;Java;JavaScript;Kotlin;Python;SQL;TypeScript,MariaDB;MySQL,Cassandra;MariaDB;MongoDB;MySQL,,,Angular/Angular.js;Spring,Angular/Angular.js;Django;Spring,,Flutter;TensorFlow,Eclipse;Notepad++;Visual Studio Code,Windows,I do not use containers,Not at all,,No,Yes,What?,YouTube,Online,Username,2015,Daily or almost daily,Find answers to specific questions;Pass the ti...,3-5 times per week,Stack Overflow was much faster,11-30 minutes,Yes,A few times per month or weekly,Yes,"No, I've heard of them, but I am not part of a...","Yes, somewhat",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,29.0,Man,No,Straight / Heterosexual,South Asian,Yes,Appropriate in length,Neither easy nor difficult
39967,I am a student who is learning to code,Yes,Once a month or more often,The quality of OSS and closed source software ...,"Not employed, but looking for work",India,"Yes, full-time","Bachelor’s degree (BA, BS, B.Eng., etc.)",Web development or web design,Taken an online course in programming or softw...,,"Academic researcher;Designer;Developer, back-e...",Less than 1 year,22,,,,,,,I am actively looking for a job,I've never had a job,,,"Languages, frameworks, and other technologies ...",I was preparing for a job search,,,,,,,,,,,,,,,,,,Assembly;HTML/CSS;JavaScript;Python,,MySQL,,Heroku;Windows,Django,Django,,,Atom;IntelliJ;Notepad++;PyCharm;Visual Studio,Windows,Testing,,,Yes,Yes,What?,LinkedIn,In real life (in person),Username,2018,Multiple times per day,Find answers to specific questions;Learn how t...,More than 10 times per week,Stack Overflow was much faster,0-10 minutes,Yes,I have never participated in Q&A on Stack Over...,"No, I knew that Stack Overflow had a job board...","No, and I don't know what those are",Neutral,Somewhat more welcome now than last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,South Asian,Yes,Appropriate in length,Easy
39980,I am a developer by profession,Yes,Less than once a month but more than once per ...,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,India,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,"1,000 to 4,999 employees","Developer, back-end;Developer, desktop or ente...",4,20,2,Very satisfied,Slightly satisfied,Somewhat confident,No,No,I am actively looking for a job,1-2 years ago,"Write any code;Write code by hand (e.g., on a ...",No,"Languages, frameworks, and other technologies ...",I was preparing for a job search,INR,Indian rupee,32000.0,Monthly,5376.0,35.0,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Distrac...,Less than once per month / Never,Office,Average,No,,"No, but I think we should",Not sure,I have little or no influence,JavaScript;Ruby,JavaScript;Python,Firebase;MongoDB,Firebase;MongoDB;MySQL;PostgreSQL;Redis,Docker;Kubernetes,Docker;Kubernetes,Angular/Angular.js;Express;Flask;React.js,Angular/Angular.js;Express;Flask;React.js;Othe...,Cordova;Node.js,Node.js;Pandas,PyCharm;Visual Studio Code,Windows,Development;Testing,Non-currency applications of blockchain,Useful across many domains and could change ma...,Yes,Yes,No,YouTube,In real life (in person),UserID,2014,Multiple times per day,Find answers to specific questions;Learn how t...,More than 10 times per week,Stack Overflow was much faster,60+ minutes,Yes,Daily or almost daily,Yes,"No, I've heard of them, but I am not part of a...","Yes, definitely",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,23.0,Man,No,Straight / Heterosexual,South Asian,No,Appropriate in length,Neither easy nor difficult
39987,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,India,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,20 to 99 employees,"Developer, full-stack;Engineering manager",10,18,6,Slightly satisfied,Slightly satisfied,,,,"I’m not actively looking, but I am open to new...",1-2 years ago,"Write any code;Write code by hand (e.g., on a ...",Yes,"Languages, frameworks, and other technologies ...","Something else changed (education, award, medi...",INR,Indian rupee,800000.0,Yearly,11194.0,45.0,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Not eno...,Less than once per month / Never,"Other place, such as a coworking space or cafe",Far above average,"Yes, because I see value in code review",2.0,"No, but I think we should",Developers and management have nearly equal in...,I have some influence,Bash/Shell/PowerShell;C#;HTML/CSS;JavaScript;T...,Bash/Shell/PowerShell;C#;HTML/CSS;JavaScript;R...,Microsoft SQL Server;PostgreSQL,PostgreSQL;Redis,Arduino;AWS;Docker;Google Cloud Platform;Linux...,Arduino;Docker;Kubernetes;Linux,ASP.NET;Vue.js,Django;Express;Vue.js,.NET Core;Node.js,.NET Core;Node.js,Vim;Visual Studio;Visual Studio Code,Linux-based,Development;Testing;Production,Not at all,,No,Yes,What?,WhatsApp,In real life (in person),Username,2013,Daily or almost daily,Find answers to specific questions;Learn how t...,3-5 times per week,They were about the same,,Yes,A few times per month or weekly,Yes,"No, and I don't know what those are","Yes, somewhat",Just as welcome now as I felt last year,Tech meetups or events in your area,29.0,Man,No,Straight / Heterosexual,,Yes,Appropriate in length,Easy


In [38]:
type(country_grp.get_group('India'))

pandas.core.frame.DataFrame

### `SeriesGroupBy` object returned from `DataFrameGroupBy[<column>]`

Now, if we want to see the **most used social media sites grouped by countries**:

In [39]:
country_grp['SocialMedia']

<pandas.core.groupby.generic.SeriesGroupBy object at 0x0000017D6CDCE308>

In [40]:
type(country_grp['SocialMedia'])

pandas.core.groupby.generic.SeriesGroupBy

### `Series` object, for a group, returned from `SeriesGroupBy.get_group(<group_name>)`

In [41]:
country_grp['SocialMedia'].get_group('India')

Respondent
8         YouTube
10        YouTube
15        YouTube
50        YouTube
65       WhatsApp
           ...   
39965     YouTube
39967    LinkedIn
39980     YouTube
39987    WhatsApp
39995    WhatsApp
Name: SocialMedia, Length: 4072, dtype: object

Note: 
- Just like `DataFrame[<coulmn>]` returns a `Series` object,
- `DataFrameGroupBy[<coulmn>]` returns a `SeriesGroupBy` object.<br><br>

- `DataFrameGroupBy.get_group()` returns a `DataFrame` object,
- `SeriesGroupBy.get_group()` returns a `Series` object,

In [42]:
country_grp['SocialMedia'].value_counts()

Country      SocialMedia             
Afghanistan  Facebook                     7
             I don't use social media     2
             YouTube                      2
             Instagram                    1
             WhatsApp                     1
                                         ..
Zambia       Twitter                      1
Zimbabwe     WhatsApp                    10
             Twitter                      7
             Facebook                     2
             Instagram                    2
Name: SocialMedia, Length: 1041, dtype: int64

In [43]:
type(country_grp['SocialMedia'].value_counts())

pandas.core.series.Series

Note: 
- Just like `Series.value_counts()` returns a `Series` object,
- `SeriesGroupBy.value_counts()` returns a `Series` object with multiple indexes.

### Most used social media sites in India: `DataFrameGroubBy['SocialMedia'].get_group('India').value_counts()` or `DataFrameGroubBy['SocialMedia'].value_counts().loc['India']`, where `DataFrameGroupBy` is `DataFrame.groupby('Country')`

Now, if we want to view a `Series` of the **most used social media sites** with results from **India only**, we can use the `.loc` indexer on the *multiple-index* `Series` above with *'India'* as the index.

In [44]:
country_grp['SocialMedia'].value_counts().loc['India']

SocialMedia
WhatsApp                    1370
YouTube                      827
LinkedIn                     418
Facebook                     398
Instagram                    346
Twitter                      238
Reddit                       209
I don't use social media     117
Snapchat                      10
WeChat 微信                      3
Hello                          2
VK ВКонта́кте                  2
Weibo 新浪微博                     1
Youku Tudou 优酷                 1
Name: SocialMedia, dtype: int64

Or you could also simply say:

In [45]:
country_grp['SocialMedia'].get_group('India').value_counts()

WhatsApp                    1370
YouTube                      827
LinkedIn                     418
Facebook                     398
Instagram                    346
Twitter                      238
Reddit                       209
I don't use social media     117
Snapchat                      10
WeChat 微信                      3
Hello                          2
VK ВКонта́кте                  2
Youku Tudou 优酷                 1
Weibo 新浪微博                     1
Name: SocialMedia, dtype: int64

Now we can see the results for any country, without having to create filters for each country in the world.

In [46]:
country_grp['SocialMedia'].value_counts().loc['China']

SocialMedia
WeChat 微信                   181
YouTube                      27
Weibo 新浪微博                   17
I don't use social media     12
Twitter                       9
LinkedIn                      6
Reddit                        4
Instagram                     3
Youku Tudou 优酷                3
Facebook                      2
WhatsApp                      2
VK ВКонта́кте                 1
Name: SocialMedia, dtype: int64

We could also view results from a specific country with `normalize=True` argument for the `SeriesGroupBy.value_counts()` method:

In [47]:
country_grp['SocialMedia'].value_counts(normalize=True).loc['United States']

SocialMedia
Reddit                      0.291708
Twitter                     0.175158
Facebook                    0.140526
YouTube                     0.121101
I don't use social media    0.092019
Instagram                   0.082806
LinkedIn                    0.047397
WhatsApp                    0.027750
Snapchat                    0.015651
WeChat 微信                   0.004662
VK ВКонта́кте               0.000555
Weibo 新浪微博                  0.000555
Youku Tudou 优酷              0.000111
Name: SocialMedia, dtype: float64

### `.groupby()` method vs filtering

Now, we could have achieved all of this without using the `.groupby()` method, by using **country-specific filters** as below:

In [48]:
country_flt = df['Country'] == 'India'
df.loc[country_flt]['SocialMedia'].value_counts()

WhatsApp                    1370
YouTube                      827
LinkedIn                     418
Facebook                     398
Instagram                    346
Twitter                      238
Reddit                       209
I don't use social media     117
Snapchat                      10
WeChat 微信                      3
Hello                          2
VK ВКонта́кте                  2
Youku Tudou 优酷                 1
Weibo 新浪微博                     1
Name: SocialMedia, dtype: int64

The only problem with creating **country-specific filters** is that you would have to create these filters for each and every country in the world to view the results globally. Therefore, `.groupby()` method is more practical to use in such situations.

### Summary of grouping

- `DataFrame.groupby()` returns a `DataFrameGroupBy` object that contains information about the *groups*.
- `DataFrameGroupBy.get_group()` returns a `DataFrame` object for the specified group.<br><br>
- `DataFrameGroupBy[<coulmn>]` returns a `SeriesGroupBy` object.
- `SeriesGroupBy.get_group()` returns a `Series` object for the specified group.
- `SeriesGroupBy.value_counts()` returns a `Series` object with multiple indexes.<br><br>
- This is how you can get **Most used social media sites in India**:
    - `DataFrameGroubBy['SocialMedia'].get_group('India').value_counts()` or
    - `DataFrameGroubBy['SocialMedia'].value_counts().loc['India']`
    - where `DataFrameGroupBy` is `DataFrame.groupby('Country')`

## Aggregating results after grouping

In [49]:
country_grp['ConvertedComp'].median()

Country
Afghanistan                              2796.0
Albania                                 11040.0
Algeria                                  6864.0
Andorra                                     NaN
Angola                                   7764.0
                                         ...   
Venezuela, Bolivarian Republic of...    12000.0
Viet Nam                                10086.0
Yemen                                   13854.0
Zambia                                  12607.0
Zimbabwe                                18000.0
Name: ConvertedComp, Length: 173, dtype: float64

In [50]:
country_grp['ConvertedComp'].median().loc['Germany']

63016.0

***

### `DataFrame.agg()` and `Series.agg()` methods

> *Signature*: 
- `DataFrame.agg(func, axis=0, *args, **kwargs)`
- `Series.agg(func, axis=0, *args, **kwargs)`

> *Docstring*:<br>
Aggregate using one or more operations over the specified axis.

> *Returns*:
- scalar : when `Series.agg()` is called with SINGLE function
- `Series` : when `DataFrame.agg()` is called with a SINGLE function (or when `Series.agg()` is called with SEVERAL functions)
- `DataFrame` : when `DataFrame.agg()` is called with SEVERAL functions


**Example-1** of 'Returns': **scalar : when** `Series.agg()` **is called with SINGLE function:**

In [51]:
df['ConvertedComp'].agg('mean')

126511.9085757612

In [52]:
type(df['ConvertedComp'].agg('mean'))

float

**Example-2** of 'Returns': `Series` : **when** `DataFrame.agg()` **is called with a SINGLE function:**

In [53]:
df.agg('mean')

CompTotal        1.194709e+12
ConvertedComp    1.265119e+05
WorkWeekHrs      4.224593e+01
CodeRevHrs       5.131483e+00
Age              3.034612e+01
dtype: float64

In [54]:
type(df.agg('mean'))

pandas.core.series.Series

**Example-3** of 'Returns': `DataFrame` : **when** `DataFrame.agg()` **is called with SEVERAL functions:**

In [55]:
df.agg(['min', 'mean', 'max'])

Unnamed: 0,MainBranch,Hobbyist,OpenSourcer,Country,CompTotal,ConvertedComp,WorkWeekHrs,CodeRevHrs,Age
min,I am a developer by profession,No,Less than once a month but more than once per ...,Afghanistan,0.0,0.0,1.0,0.0,1.0
max,"I used to be a developer by profession, but no...",Yes,Once a month or more often,Zimbabwe,1e+16,2000000.0,4125.0,99.0,99.0
mean,,,,,1194709000000.0,126511.9,42.245933,5.131483,30.346116


In [56]:
type(df.agg(['min', 'mean', 'max']))

pandas.core.frame.DataFrame

In [57]:
df.agg(['min', 'mean', 'max']).shape

(3, 9)

### `DataFrameGroupBy.agg()` and `SeriesGroupBy.agg()` methods

Since we are already adding a *group* dimension when we go from, for instance, a `DataFrame` to a `DataFrameGroupBy` object, a dimension is also added to the *returned* data type (unless the *returned* data type is already a `DataFrame`) when we run the `DataFrameGroupBy.agg()` and `SeriesGroupBy.agg()` methods.

**Counterpart of Example-1 above: when** `SeriesGroupBy.agg()` **is called with SINGLE function:**

In [58]:
country_grp['ConvertedComp'].agg('mean')

Country
Afghanistan                              6025.500000
Albania                                 30114.120000
Algeria                                  8087.111111
Andorra                                          NaN
Angola                                   7764.000000
                                            ...     
Venezuela, Bolivarian Republic of...    22720.181818
Viet Nam                                19874.647059
Yemen                                   13854.000000
Zambia                                  10254.333333
Zimbabwe                                19285.714286
Name: ConvertedComp, Length: 173, dtype: float64

In [59]:
type(country_grp['ConvertedComp'].agg('mean'))

pandas.core.series.Series

So, we see that a `Series` object is now returned instead of a scalar object being returned in Example-1.

**Counterpart of Example-2 above: when** `DataFrameGroupBy.agg()` **is called with SINGLE function:**

In [60]:
country_grp.agg('mean')

Unnamed: 0_level_0,CompTotal,ConvertedComp,WorkWeekHrs,CodeRevHrs,Age
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Afghanistan,3.775050e+04,6025.500000,46.777778,16.625000,21.571429
Albania,2.880040e+05,30114.120000,39.300000,5.720000,25.214286
Algeria,7.077783e+04,8087.111111,41.411765,7.000000,27.940000
Andorra,,,,,20.000000
Angola,2.020000e+05,7764.000000,45.000000,10.000000,22.000000
...,...,...,...,...,...
"Venezuela, Bolivarian Republic of...",8.416505e+04,22720.181818,45.148148,8.142857,28.025000
Viet Nam,1.850297e+07,19874.647059,43.800000,11.141026,25.924242
Yemen,1.750000e+05,13854.000000,55.000000,5.500000,33.000000
Zambia,5.600000e+04,10254.333333,57.500000,18.500000,30.333333


In [61]:
type(country_grp.agg('mean'))

pandas.core.frame.DataFrame

Here, we see that a `DataFrame` object is now returned instead of a `Series` object being returned in Example-2.

**Counterpart of Example-3 above: when** `DataFrameGroupBy.agg()` **is called with SEVERAL functions:**

In [62]:
country_grp.agg(['min', 'mean', 'max'])

Unnamed: 0_level_0,CompTotal,CompTotal,CompTotal,ConvertedComp,ConvertedComp,ConvertedComp,WorkWeekHrs,WorkWeekHrs,WorkWeekHrs,CodeRevHrs,CodeRevHrs,CodeRevHrs,Age,Age,Age
Unnamed: 0_level_1,min,mean,max,min,mean,max,min,mean,max,min,mean,max,min,mean,max
Country,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2
Afghanistan,1.0,3.775050e+04,120000.0,0.0,6025.500000,19152.0,1.0,46.777778,168.0,1.0,16.625000,90.0,1.0,21.571429,26.0
Albania,400.0,2.880040e+05,2688000.0,4416.0,30114.120000,187668.0,8.0,39.300000,65.0,1.0,5.720000,20.0,15.0,25.214286,40.0
Algeria,1.0,7.077783e+04,230000.0,0.0,8087.111111,23376.0,6.0,41.411765,168.0,1.0,7.000000,14.0,16.0,27.940000,56.0
Andorra,,,,,,,,,,,,,20.0,20.000000,20.0
Angola,202000.0,2.020000e+05,202000.0,7764.0,7764.000000,7764.0,45.0,45.000000,45.0,10.0,10.000000,10.0,22.0,22.000000,22.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Venezuela, Bolivarian Republic of...",30.0,8.416505e+04,1200000.0,360.0,22720.181818,137484.0,3.0,45.148148,160.0,1.0,8.142857,50.0,17.0,28.025000,60.0
Viet Nam,0.0,1.850297e+07,100000000.0,517.0,19874.647059,140000.0,8.0,43.800000,160.0,0.5,11.141026,99.0,13.0,25.924242,46.0
Yemen,50000.0,1.750000e+05,300000.0,13332.0,13854.000000,14376.0,40.0,55.000000,70.0,1.0,5.500000,10.0,30.0,33.000000,39.0
Zambia,5000.0,5.600000e+04,150000.0,5040.0,10254.333333,13116.0,40.0,57.500000,75.0,7.0,18.500000,30.0,23.0,30.333333,49.0


In [63]:
type(country_grp.agg(['min', 'mean', 'max']))

pandas.core.frame.DataFrame

Here, we see that a `DataFrame` object is returned, just like the `DataFrame` object being returned in Example-3, except that now the `DataFrame` object has an additional *group* dimension.

***

If we want the **mean and median salaries for respondents from Canada:**

In [64]:
country_grp['ConvertedComp'].agg(['mean', 'median']).loc['Canada']

mean      133818.897614
median     68705.000000
Name: Canada, dtype: float64

Or we could also say:

In [65]:
country_grp['ConvertedComp'].get_group('Canada').agg(['mean', 'median'])

mean      133818.897614
median     68705.000000
Name: ConvertedComp, dtype: float64

<h2>Challenge</h2>

### Task

We want to create a new `DataFrame` with the following structure:

> Columns:
    - NumRespondents: Total # of developers who responded to the question for 'LanguageWorkedWith'
    - NumKnowPython: # of developers who responded with 'Python' as one of the languages they know.
    - PctKnowPython: % of who know 'Python'
> Rows:
    - Countries of the developers

### Solution

In this solution we will:
1. Create a `Series` for *'NumRespondents'*,
2. Create another `Series` for *'NumKnowPython'*,
3. 'Concatenate' both `Series` into a `DataFrame`,
4. Rename the `DataFrame` to something more comprehensible,
4. Add a calculated column for *'PctKnowPython'* to this `DataFrame`, and
5. Sort this `DataFrame` however we prefer.

In [66]:
num_respondents = country_grp['LanguageWorkedWith'].count()

In [67]:
num_respondents

Country
Afghanistan                             15
Albania                                 41
Algeria                                 57
Andorra                                  2
Angola                                   1
                                        ..
Venezuela, Bolivarian Republic of...    43
Viet Nam                                92
Yemen                                    5
Zambia                                   6
Zimbabwe                                21
Name: LanguageWorkedWith, Length: 173, dtype: int64

In [68]:
# num_know_python = country_grp['LanguageWorkedWith'].str.contains('Python').sum()

The command commented in above will raise an `AttributeError` since the `SeriesGroupBy` object has no attribute `str`. Therefore, we would have to use the `.apply()` method with the `SeriesGroupBy` object.

In [69]:
num_know_python = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())

In [70]:
country_grp['LanguageWorkedWith']

<pandas.core.groupby.generic.SeriesGroupBy object at 0x0000017D6F100B08>

In [71]:
num_know_python

Country
Afghanistan                              3
Albania                                 13
Algeria                                 13
Andorra                                  0
Angola                                   1
                                        ..
Venezuela, Bolivarian Republic of...    16
Viet Nam                                36
Yemen                                    1
Zambia                                   1
Zimbabwe                                10
Name: LanguageWorkedWith, Length: 173, dtype: int64

In [72]:
df_know_python = pd.concat([num_respondents, num_know_python], axis='columns')

Note: In `pandas.concat()`, we have set `axis='coulmns'` to concatenate the *columns* axis, since the default concatenation happens along the `index` (rows) axis.

In [73]:
df_know_python

Unnamed: 0_level_0,LanguageWorkedWith,LanguageWorkedWith
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,15,3
Albania,41,13
Algeria,57,13
Andorra,2,0
Angola,1,1
...,...,...
"Venezuela, Bolivarian Republic of...",43,16
Viet Nam,92,36
Yemen,5,1
Zambia,6,1


In [74]:
df_know_python.columns = ['NumRespondents', 'NumKnowPython']

In [75]:
df_know_python

Unnamed: 0_level_0,NumRespondents,NumKnowPython
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,15,3
Albania,41,13
Algeria,57,13
Andorra,2,0
Angola,1,1
...,...,...
"Venezuela, Bolivarian Republic of...",43,16
Viet Nam,92,36
Yemen,5,1
Zambia,6,1


In [76]:
df_know_python['PctKnowPython'] = (df_know_python['NumKnowPython']/df_know_python['NumRespondents']) * 100

In [77]:
df_know_python

Unnamed: 0_level_0,NumRespondents,NumKnowPython,PctKnowPython
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,15,3,20.000000
Albania,41,13,31.707317
Algeria,57,13,22.807018
Andorra,2,0,0.000000
Angola,1,1,100.000000
...,...,...,...
"Venezuela, Bolivarian Republic of...",43,16,37.209302
Viet Nam,92,36,39.130435
Yemen,5,1,20.000000
Zambia,6,1,16.666667


And now, a reasonable way to sort this `DataFrame` could be to sort by *'PctKnowPython'* in descending order.

In [78]:
df_know_python.sort_values(by='PctKnowPython', ascending=False, inplace=True)

In [79]:
df_know_python.head(50)

Unnamed: 0_level_0,NumRespondents,NumKnowPython,PctKnowPython
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Benin,3,3,100.0
Oman,4,4,100.0
Guinea,1,1,100.0
Angola,1,1,100.0
Timor-Leste,1,1,100.0
Mali,1,1,100.0
Sierra Leone,1,1,100.0
Bahamas,1,1,100.0
Dominica,1,1,100.0
Niger,1,1,100.0
