# Filtering

## 기본 filtering

In [1]:
people = {
    "first" : ["Corey", "Jane", "John"],
    "last" : ["Schafer", "Doe", "Doe"],
    "email" : ["CoreyMSchafer@gmail.com", "JaneDoe@gmail.com", "JohnDoe@gmail.com"]
}

In [2]:
import pandas as pd

In [4]:
df = pd.DataFrame(people)
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@gmail.com
2,John,Doe,JohnDoe@gmail.com


In [6]:
filt = df['last'] == 'Doe'

In [7]:
df[filt]

Unnamed: 0,first,last,email
1,Jane,Doe,JaneDoe@gmail.com
2,John,Doe,JohnDoe@gmail.com


In [8]:
df.loc[filt]

Unnamed: 0,first,last,email
1,Jane,Doe,JaneDoe@gmail.com
2,John,Doe,JohnDoe@gmail.com


In [13]:
df.loc[df['last']=='Doe', 'email']

1    JaneDoe@gmail.com
2    JohnDoe@gmail.com
Name: email, dtype: object

In [12]:
df.loc[df['last']=='Doe', ['last', 'email']]

Unnamed: 0,last,email
1,Doe,JaneDoe@gmail.com
2,Doe,JohnDoe@gmail.com


## &, |을 사용해서 filtering을 할때 에러를 내는 경우

### &을 사용할때, 좌변과 우변을 괄호를 치지 않으면 에러가 난다.

In [16]:
df.loc[df['last']=='Doe' & df['first']=='John']

TypeError: Cannot perform 'rand_' with a dtyped [object] array and scalar of type [bool]

In [17]:
df.loc[(df['last']=='Doe') & (df['first']=='John')]

Unnamed: 0,first,last,email
2,John,Doe,JohnDoe@gmail.com


In [18]:
df.loc[(df['last'] == 'Schafer' ) |( df['first'] == 'John')]

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
2,John,Doe,JohnDoe@gmail.com


In [19]:
df.loc[-((df['last'] == 'Schafer' ) |( df['first'] == 'John'))]

Unnamed: 0,first,last,email
1,Jane,Doe,JaneDoe@gmail.com


In [21]:
df = pd.read_csv('~/Downloads/survey_results_public.csv')

In [22]:
schema_df = pd.read_csv('~/Downloads/survey_results_schema.csv')


In [24]:
df.loc[df['ConvertedComp'] > 7000]

Unnamed: 0,Respondent,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,...,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
2,3,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Thailand,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Web development or web design,...,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,28.0,Man,No,Straight / Heterosexual,,Yes,Appropriate in length,Neither easy nor difficult
3,4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
5,6,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Canada,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Mathematics or statistics,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,28.0,Man,No,Straight / Heterosexual,East Asian,No,Too long,Neither easy nor difficult
8,9,I am a developer by profession,Yes,Once a month or more often,The quality of OSS and closed source software ...,Employed full-time,New Zealand,No,Some college/university study without earning ...,"Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,,23.0,Man,No,Bisexual,White or of European descent,No,Appropriate in length,Neither easy nor difficult
9,10,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,India,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)",,...,Somewhat less welcome now than last year,Tech articles written by other developers;Tech...,,,,,,Yes,Too long,Difficult
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88325,88878,I am a developer by profession,Yes,Less than once per year,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,26.0,Man,No,Straight / Heterosexual,South Asian,No,Appropriate in length,Easy
88326,88879,I am a developer by profession,Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Finland,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...",...,Not applicable - I did not use Stack Overflow ...,,34.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
88328,88881,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Austria,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...",...,,,37.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
88329,88882,I am a developer by profession,Yes,Never,"OSS is, on average, of LOWER quality than prop...",Employed full-time,Netherlands,"Yes, full-time","Master’s degree (MA, MS, M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,,,Man,No,Straight / Heterosexual,White or of European descent,Yes,Too long,Easy


In [25]:
df.loc[df['ConvertedComp'] > 7000, ['Country', 'LanguageWorkedWith', 'ConvertedComp']]

Unnamed: 0,Country,LanguageWorkedWith,ConvertedComp
2,Thailand,HTML/CSS,8820.0
3,United States,C;C++;C#;Python;SQL,61000.0
5,Canada,Java;R;SQL,366420.0
8,New Zealand,Bash/Shell/PowerShell;C#;HTML/CSS;JavaScript;P...,95179.0
9,India,C#;Go;JavaScript;Python;R;SQL,13293.0
...,...,...,...
88325,United States,HTML/CSS;JavaScript;Scala;TypeScript,130000.0
88326,Finland,Bash/Shell/PowerShell;C++;Python,82488.0
88328,Austria,Bash/Shell/PowerShell;Go;HTML/CSS;Java;JavaScr...,68745.0
88329,Netherlands,C#;HTML/CSS;Java;JavaScript;PHP;Python,588012.0


## isin은 SQL의 where A in (...)과 같다.

In [26]:
countries = ['United States', 'India', 'United Kingdom', 'Germany', 'Canada']
df.loc[df['Country'].isin(countries), 'Country']

0        United Kingdom
3         United States
5                Canada
7                 India
9                 India
              ...      
88859     United States
88863    United Kingdom
88864             India
88877     United States
88878            Canada
Name: Country, Length: 45008, dtype: object

## 각 row가 포함하고 있는 키워드를 filtering을 할때는 str.contains를 사용한다.

In [28]:
df.loc[df['LanguageWorkedWith'].str.contains('Python', na=False)]

Unnamed: 0,Respondent,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,...,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
0,1,I am a student who is learning to code,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work",United Kingdom,No,Primary/elementary school,,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,14.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
1,2,I am a student who is learning to code,No,Less than once per year,The quality of OSS and closed source software ...,"Not employed, but looking for work",Bosnia and Herzegovina,"Yes, full-time","Secondary school (e.g. American high school, G...",,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,19.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
3,4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
4,5,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Ukraine,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,30.0,Man,No,Straight / Heterosexual,White or of European descent;Multiracial,No,Appropriate in length,Easy
7,8,I code primarily as a hobby,Yes,Less than once per year,"OSS is, on average, of HIGHER quality than pro...","Not employed, but looking for work",India,,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",...,A lot more welcome now than last year,Tech articles written by other developers;Indu...,24.0,Man,No,Straight / Heterosexual,,,Appropriate in length,Neither easy nor difficult
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88854,84539,,Yes,Less than once a month but more than once per ...,The quality of OSS and closed source software ...,Employed full-time,United Kingdom,"Yes, full-time","Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,Courses on technologies you're interested in,23.0,Woman,Yes,Bisexual,White or of European descent,No,Appropriate in length,Easy
88860,85738,,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work",Brazil,"Yes, full-time","Secondary school (e.g. American high school, G...",,...,Just as welcome now as I felt last year,Industry news about technologies you're intere...,15.0,Man,No,Straight / Heterosexual,Hispanic or Latino/Latina;White or of European...,No,Too short,Easy
88865,86566,,Yes,Less than once a month but more than once per ...,"OSS is, on average, of HIGHER quality than pro...",Retired,Switzerland,No,Some college/university study without earning ...,"A humanities discipline (ex. literature, histo...",...,Just as welcome now as I felt last year,Tech articles written by other developers;Cour...,74.0,Man,No,,White or of European descent,No,Appropriate in length,Easy
88872,87739,,Yes,Less than once per year,"OSS is, on average, of HIGHER quality than pro...",Employed part-time,Czech Republic,"Yes, full-time","Master’s degree (MA, MS, M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,,25.0,,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
