# Analisando dados da pesquisa "Stackoverflow 2019"

Começamos importando a biblioteca pandas e lendo os arquivos csv `'survey_results_public.csv'` e `'survey_results_schema.csv'`. O parâmetro `index_col` permite selecionar uma das colunas como índice do dataframe, ao invés de utilizar números inteiros gerados automaticamente. Isto nos permite realizar pesquisas — através da propriedade `.loc` — de forma mais rápida e conveniente.

In [1]:
import pandas as pd

In [2]:
# arquivo contendo as respostas obtidas na pesquisa
df = pd.read_csv('data/survey_results_public.csv', index_col='Respondent')
# arquivo contendo os textos das perguntas feitas na pesquisa
# (os rótulos das categorias estão simplificados)
schema_df = pd.read_csv('data/survey_results_schema.csv', index_col='Column')

In [3]:
# exibe os primeiros registros das pesquisas
df.head()

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,...,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,I am a student who is learning to code,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work",United Kingdom,No,Primary/elementary school,,"Taught yourself a new language, framework, or ...",...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,14.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
2,I am a student who is learning to code,No,Less than once per year,The quality of OSS and closed source software ...,"Not employed, but looking for work",Bosnia and Herzegovina,"Yes, full-time","Secondary school (e.g. American high school, G...",,Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,19.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
3,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Thailand,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Web development or web design,"Taught yourself a new language, framework, or ...",...,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,28.0,Man,No,Straight / Heterosexual,,Yes,Appropriate in length,Neither easy nor difficult
4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
5,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Ukraine,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,30.0,Man,No,Straight / Heterosexual,White or of European descent;Multiracial,No,Appropriate in length,Easy


In [4]:
# exibe os primeiros textos das questões
schema_df.head()

Unnamed: 0_level_0,QuestionText
Column,Unnamed: 1_level_1
Respondent,Randomized respondent ID number (not in order ...
MainBranch,Which of the following options best describes ...
Hobbyist,Do you code as a hobby?
OpenSourcer,How often do you contribute to open source?
OpenSource,How do you feel about the quality of open sour...


Por padrão, o Jupyter mostra apenas 20 colunas do dataframe. Para exibir um número maior de linhas (ou de colunas), podemos utilizar:

In [5]:
#pd.set_option('display.max_columns', 85)
#pd.set_option('display.max_rows', 85)

Como mencionado no início do documento, localizar uma linha do dataframe é simples de ser feito, através da propriedade `.loc`. No caso do dataframe principal `df`, o índice corresponde ao número do usuário, iniciado em `1`. Isto seria *um pouco* diferente se não tivéssemos passado o parâmetro `index_col='Respondent'` no método `read_csv()`: por padrão, a primeira linha corresponde ao índice `0`.

Para exibir alguns dados do 13º respondende, basta chamar:

In [6]:
df.loc[13]

MainBranch                         I am a developer by profession
Hobbyist                                                      Yes
OpenSourcer     Less than once a month but more than once per ...
OpenSource      OSS is, on average, of HIGHER quality than pro...
Employment                                     Employed full-time
                                      ...                        
Sexuality                                 Straight / Heterosexual
Ethnicity                            White or of European descent
Dependents                                                    Yes
SurveyLength                                Appropriate in length
SurveyEase                                                   Easy
Name: 13, Length: 84, dtype: object

No caso do dataframe `schema_df`, a situação já é bastante diferente. Sem o parâmetro `index_col='Column'`, a sintaxe a seguir seria impossível:

In [7]:
schema_df.loc['OpenSource']

QuestionText    How do you feel about the quality of open sour...
Name: OpenSource, dtype: object

Para localizar essa mesma linha, deveríamos ter chamado `schema_df.loc[4]`, que resultaria no erro:

*TypeError: cannot do label indexing on <class \'pandas.core.indexes.base.Index\'> with these indexers \[4\] of <class \'int\'>*

Reparar, mais uma vez, que o texto está truncado. Conhecendo a estrutura dessa tabela, podemos exibir toda a pergunta passando mais um parâmetro em `.loc`:

In [8]:
schema_df.loc['OpenSource', 'QuestionText']

'How do you feel about the quality of open source software (OSS)?'

Filtrar uma coluna é feito através de **indexação booleana**, uma das maravilhas da biblioteca pandas. Por exemplo, vejamos quais respondentes são do Brasil:

In [19]:
filtro = (df['Country'] == 'Brazil')  # os parênteses são opcionais but they improve reading!
filtro  # exibe o filtro

Respondent
1        False
2        False
3        False
4        False
5        False
         ...  
88377    False
88601    False
88802    False
88816    False
88863    False
Name: Country, Length: 88883, dtype: bool

Não basta, portanto, criar a condição. Precisamos, também, passá-la como argumento para `df`, isto é, retornar o dataframe indexado a esta condição:

In [10]:
df[filtro]

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,...,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
19,I am a developer by profession,Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Brazil,No,Some college/university study without earning ...,"Computer science, computer engineering, or sof...",Received on-the-job training in software devel...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,31.0,Man,No,Straight / Heterosexual,Hispanic or Latino/Latina,Yes,Too long,Easy
87,I am a developer by profession,Yes,Never,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Brazil,"Yes, part-time","Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Somewhat more welcome now than last year,Industry news about technologies you're intere...,36.0,Man,No,Straight / Heterosexual,White or of European descent,Yes,Appropriate in length,Easy
130,I am a developer by profession,Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Brazil,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,23.0,Man,No,Straight / Heterosexual,Hispanic or Latino/Latina,No,Appropriate in length,Easy
345,I am a developer by profession,Yes,Less than once per year,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Brazil,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,33.0,Man,No,Straight / Heterosexual,Hispanic or Latino/Latina,Yes,Too long,Neither easy nor difficult
350,I am a developer by profession,Yes,Less than once a month but more than once per ...,"OSS is, on average, of HIGHER quality than pro...","Not employed, but looking for work",Brazil,"Yes, full-time","Secondary school (e.g. American high school, G...",,"Taught yourself a new language, framework, or ...",...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,19.0,Woman,Yes,Bisexual,Multiracial,No,Appropriate in length,Easy
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42602,,No,Once a month or more often,"OSS is, on average, of LOWER quality than prop...","Not employed, and not looking for work",Brazil,"Yes, part-time",I never completed any formal education,,,...,Not applicable - I did not use Stack Overflow ...,Tech meetups or events in your area,,Woman,Yes,Straight / Heterosexual,East Asian,,Appropriate in length,Neither easy nor difficult
65265,,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Brazil,No,"Other doctoral degree (Ph.D, Ed.D., etc.)","Computer science, computer engineering, or sof...","Taught yourself a new language, framework, or ...",...,Just as welcome now as I felt last year,Tech articles written by other developers,45.0,Man,No,Straight / Heterosexual,,Yes,Appropriate in length,Neither easy nor difficult
72492,,Yes,Once a month or more often,The quality of OSS and closed source software ...,"Not employed, but looking for work",Brazil,No,Some college/university study without earning ...,I never declared a major,"Taught yourself a new language, framework, or ...",...,,Tech meetups or events in your area,22.0,Man,No,Bisexual;Gay or Lesbian,Hispanic or Latino/Latina,No,Too short,Easy
83000,,No,Less than once per year,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Brazil,No,"Secondary school (e.g. American high school, G...",,Taken an online course in programming or softw...,...,A lot more welcome now than last year,Tech meetups or events in your area;Courses on...,29.0,Man,No,Straight / Heterosexual,,Yes,Appropriate in length,Neither easy nor difficult


O mesmo seria obtido com `df[df['Country'] == 'Brazil']`. Por questões de legibilidade, isto não é muito recomendado. Vemos, então, que 1948 dos respondentes estão aqui, em terras tupiniquins! Bacana! Mas, repare: a chamada `df[df['Country'] == 'Brazil']` retorna o *dataset inteiro*, isto é, todas as colunas, apenas filtrada pela condição booleana que foi imposta. 

Entretanto, através de `.loc` podemos pedir apenas a(s) coluna(s) que desejamos visualizar. Por exemplo, se eu estiver interessado em saber se tais respondentes são estudantes:

In [11]:
df.loc[filtro, 'Student']

Respondent
19                   No
87       Yes, part-time
130                  No
345                  No
350      Yes, full-time
              ...      
42602    Yes, part-time
65265                No
72492                No
83000                No
85738    Yes, full-time
Name: Student, Length: 1948, dtype: object

É possível passar mais de uma coluna, obviamente — não esquecendo dos "colchetes duplos"!

In [12]:
df.loc[filtro, ['Student', 'Age']]

Unnamed: 0_level_0,Student,Age
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1
19,No,31.0
87,"Yes, part-time",36.0
130,No,23.0
345,No,33.0
350,"Yes, full-time",19.0
...,...,...
42602,"Yes, part-time",
65265,No,45.0
72492,No,22.0
83000,No,29.0


Se eu quiser registros de respondentes do Brasil **e** que sejam mulheres, utilizo o comando AND que, no pandas, é representado por `&`:

In [13]:
brazil_women = (df['Country'] == 'Brazil') & (df['Gender'] == 'Woman')
df[brazil_women]

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,...,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
350,I am a developer by profession,Yes,Less than once a month but more than once per ...,"OSS is, on average, of HIGHER quality than pro...","Not employed, but looking for work",Brazil,"Yes, full-time","Secondary school (e.g. American high school, G...",,"Taught yourself a new language, framework, or ...",...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,19.0,Woman,Yes,Bisexual,Multiracial,No,Appropriate in length,Easy
674,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed part-time,Brazil,"Yes, part-time","Secondary school (e.g. American high school, G...",,Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Tech...,18.0,Woman,No,Straight / Heterosexual,Hispanic or Latino/Latina,No,Too long,Neither easy nor difficult
712,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,Brazil,No,Associate degree,Web development or web design,Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech meetups or events in your area,28.0,Woman,No,Straight / Heterosexual,Hispanic or Latino/Latina;White or of European...,Yes,Appropriate in length,Easy
1201,I am a developer by profession,Yes,Less than once a month but more than once per ...,The quality of OSS and closed source software ...,Employed full-time,Brazil,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",,Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers,28.0,Woman,No,Gay or Lesbian,Hispanic or Latino/Latina;White or of European...,No,Too long,Neither easy nor difficult
1333,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed part-time,Brazil,"Yes, full-time","Secondary school (e.g. American high school, G...",,Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,,22.0,Woman,No,Gay or Lesbian,Hispanic or Latino/Latina;White or of European...,No,Appropriate in length,Easy
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87474,I am a developer by profession,Yes,Never,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Brazil,"Yes, full-time","Bachelor’s degree (BA, BS, B.Eng., etc.)","Another engineering discipline (ex. civil, ele...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Courses on technologies you're interested in,24.0,Woman,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
88014,I am a developer by profession,No,Less than once per year,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Brazil,"Yes, part-time","Bachelor’s degree (BA, BS, B.Eng., etc.)","Information systems, information technology, o...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,26.0,Woman,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
88658,I am a developer by profession,No,Less than once a month but more than once per ...,The quality of OSS and closed source software ...,Employed part-time,Brazil,"Yes, full-time",Some college/university study without earning ...,"Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Industry news about technologies you're intere...,22.0,Woman,No,Straight / Heterosexual,Hispanic or Latino/Latina,No,Appropriate in length,Neither easy nor difficult
88688,I am a developer by profession,Yes,Never,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Brazil,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,A lot more welcome now than last year,Tech articles written by other developers;Indu...,28.0,Woman,No,Straight / Heterosexual,,No,Appropriate in length,Easy


Para exibir pessoas que definiram sua sexualidade como `'Bisexual'` **ou** aquelas cuja etnia foi definida como `'East Asian'`, utilizamos a condição OR, representada por `|`:

In [14]:
bisexual_or_east_asian = (df['Sexuality'] == 'Bisexual') | (df['Ethnicity'] == 'East Asian')
df[bisexual_or_east_asian]

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,...,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Canada,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Mathematics or statistics,Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,28.0,Man,No,Straight / Heterosexual,East Asian,No,Too long,Neither easy nor difficult
9,I am a developer by profession,Yes,Once a month or more often,The quality of OSS and closed source software ...,Employed full-time,New Zealand,No,Some college/university study without earning ...,"Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,,23.0,Man,No,Bisexual,White or of European descent,No,Appropriate in length,Neither easy nor difficult
31,I am a student who is learning to code,No,Never,,"Not employed, and not looking for work",Canada,No,Primary/elementary school,,,...,,,,Woman,No,,East Asian,No,Too long,Neither easy nor difficult
34,I am a student who is learning to code,Yes,Never,"OSS is, on average, of HIGHER quality than pro...","Not employed, but looking for work",Sri Lanka,No,Primary/elementary school,,Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers,17.0,Man,,Bisexual,South Asian,Yes,Too short,Neither easy nor difficult
39,I am a developer by profession,Yes,Less than once per year,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Somewhat less welcome now than last year,Tech articles written by other developers,42.0,Man,No,Bisexual,White or of European descent,No,Appropriate in length,Easy
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
66914,,No,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",,United Kingdom,"Yes, full-time","Professional degree (JD, MD, etc.)",I never declared a major,Participated in a full-time developer training...,...,Not applicable - I did not use Stack Overflow ...,Tech articles written by other developers;Indu...,,,No,Bisexual,,No,Appropriate in length,Easy
68401,,Yes,Never,The quality of OSS and closed source software ...,Employed part-time,Thailand,"Yes, full-time","Secondary school (e.g. American high school, G...",,,...,Somewhat less welcome now than last year,,,Woman;Man,No,Bisexual,,No,Too long,Easy
78266,,No,Less than once per year,The quality of OSS and closed source software ...,Employed full-time,Japan,No,Some college/university study without earning ...,"Information systems, information technology, o...",Taken an online course in programming or softw...,...,Somewhat more welcome now than last year,Tech articles written by other developers,,Woman,No,Straight / Heterosexual,East Asian,Yes,Too long,Neither easy nor difficult
83397,,Yes,Less than once per year,,"Not employed, but looking for work",United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,,27.0,Woman,No,Bisexual,White or of European descent,No,Appropriate in length,Easy


Para exibir o *complemento* deste dataset, ou seja, para **negar** uma condição, basta colocar `~` à frente do filtro na chamada do dataset. Por exemplo, estou interessado em pessoas que não acharam a extensão da pesquisa 'Appropriate in length', isto é, quem achou a pesquisa muito extensa, muito curta, não respondeu, etc:

In [15]:
appropriate_length = (df['SurveyLength'] == 'Appropriate in length')
df[~appropriate_length]

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,...,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Canada,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Mathematics or statistics,Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,28.0,Man,No,Straight / Heterosexual,East Asian,No,Too long,Neither easy nor difficult
10,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,India,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)",,,...,Somewhat less welcome now than last year,Tech articles written by other developers;Tech...,,,,,,Yes,Too long,Difficult
14,I am a developer by profession,Yes,Less than once per year,The quality of OSS and closed source software ...,Employed full-time,Germany,No,"Other doctoral degree (Ph.D, Ed.D., etc.)","Computer science, computer engineering, or sof...",Completed an industry certification program (e...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Cour...,31.0,Man,No,Straight / Heterosexual,White or of European descent,No,Too short,Easy
15,I am a student who is learning to code,Yes,Never,"OSS is, on average, of HIGHER quality than pro...","Not employed, but looking for work",India,"Yes, full-time","Secondary school (e.g. American high school, G...",,Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,20.0,Man,No,,,Yes,Too long,Neither easy nor difficult
19,I am a developer by profession,Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Brazil,No,Some college/university study without earning ...,"Computer science, computer engineering, or sof...",Received on-the-job training in software devel...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,31.0,Man,No,Straight / Heterosexual,Hispanic or Latino/Latina,Yes,Too long,Easy
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88182,,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed part-time,Pakistan,,"Secondary school (e.g. American high school, G...",,Taken an online course in programming or softw...,...,Not applicable - I did not use Stack Overflow ...,Courses on technologies you're interested in,,Man,No,Straight / Heterosexual,,Yes,Too short,Neither easy nor difficult
88282,,Yes,Once a month or more often,The quality of OSS and closed source software ...,"Not employed, but looking for work",United States,No,Some college/university study without earning ...,"Computer science, computer engineering, or sof...","Taught yourself a new language, framework, or ...",...,Just as welcome now as I felt last year,,,Man,No,Straight / Heterosexual,,No,Too short,Neither easy nor difficult
88601,,No,Never,The quality of OSS and closed source software ...,,,,,,,...,,,,,,,,,,
88802,,No,Never,,Employed full-time,,,,,,...,,,,,,,,,,


Podemos também realizar comparações numéricas. Por exemplo, estou interessado em pessoas que recebem salário anual igual ou superior a US$ 500.000,00: onde residem, com quais linguagens programam e quais salários recebem?

In [16]:
salarios_altos = (df['ConvertedComp'] >= 500000)
df.loc[salarios_altos, ['Country', 'LanguageWorkedWith', 'ConvertedComp']]

Unnamed: 0_level_0,Country,LanguageWorkedWith,ConvertedComp
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
32,United States,Bash/Shell/PowerShell;HTML/CSS;JavaScript;PHP;...,1100000.0
58,United States,C#;Java;SQL,2000000.0
102,United States,C#;HTML/CSS;JavaScript;SQL;TypeScript,2000000.0
128,United Kingdom,Bash/Shell/PowerShell;Go;Ruby,1000000.0
155,Germany,Bash/Shell/PowerShell;HTML/CSS;JavaScript;Type...,962424.0
...,...,...,...
88818,United States,C#;HTML/CSS;JavaScript;SQL;TypeScript,2000000.0
88868,Germany,HTML/CSS;JavaScript;PHP;TypeScript,797436.0
88874,United States,C++;Python;Scala;SQL,2000000.0
88877,United States,Bash/Shell/PowerShell;C;Clojure;HTML/CSS;Java;...,2000000.0


Digamos que eu queira saber de salários altos que estejam dentre um certo conjunto de países. Podemos utilizar o método `.isin()`, passando uma lista como argumento:

In [17]:
paises = ['India', 'Germany', 'Canada']
salarios_altos_nos_paises = (df['ConvertedComp'] >= 500000) & (df['Country'].isin(paises))
df.loc[salarios_altos_nos_paises, ['Country', 'LanguageWorkedWith', 'ConvertedComp']]

Unnamed: 0_level_0,Country,LanguageWorkedWith,ConvertedComp
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
155,Germany,Bash/Shell/PowerShell;HTML/CSS;JavaScript;Type...,962424.0
273,Canada,Bash/Shell/PowerShell;C++;Clojure;Erlang;HTML/...,1000000.0
274,Canada,Java;JavaScript,1380000.0
442,Germany,HTML/CSS;JavaScript,687444.0
967,Canada,Bash/Shell/PowerShell;HTML/CSS;JavaScript;Type...,1000000.0
...,...,...,...
88143,Germany,Bash/Shell/PowerShell;C;C++;C#;JavaScript;Pyth...,893688.0
88653,Canada,HTML/CSS;Java;JavaScript;Kotlin;SQL,1000000.0
88691,Canada,Bash/Shell/PowerShell;Go;PHP;Other(s):,1000000.0
88714,Canada,Bash/Shell/PowerShell;C;C++;HTML/CSS;Java;Java...,732852.0


Das pessoas acima, quais usam Python?

In [18]:
salarios_altos_nos_paises_usam_python = (df['ConvertedComp'] >= 500000) & (df['Country'].isin(paises)) & (df['LanguageWorkedWith'].str.contains('Python', na=False))
# na=False para eliminar NaN

df.loc[salarios_altos_nos_paises_usam_python, ['Country', 'LanguageWorkedWith', 'ConvertedComp']]

Unnamed: 0_level_0,Country,LanguageWorkedWith,ConvertedComp
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2318,Germany,Bash/Shell/PowerShell;C;C++;JavaScript;Python,824940.0
2333,Germany,Assembly;Bash/Shell/PowerShell;C;C++;HTML/CSS;...,1000000.0
2482,Canada,Bash/Shell/PowerShell;C++;C#;Dart;F#;Go;HTML/C...,1000000.0
2827,Canada,Bash/Shell/PowerShell;C#;HTML/CSS;JavaScript;P...,659928.0
5022,Germany,C++;HTML/CSS;Java;Python;SQL,866184.0
...,...,...,...
86978,Germany,Java;Python,893688.0
87172,Germany,C;C++;C#;Java;Python,700476.0
87341,Canada,Assembly;C;C++;HTML/CSS;Java;JavaScript;Object...,1000000.0
88143,Germany,Bash/Shell/PowerShell;C;C++;C#;JavaScript;Pyth...,893688.0


In [21]:
# não sei onde colocar isso aqui ):

# adiciona uma nova coluna
#df['nome completo'] = df['nome'] + ' ' + df['sobrenome']
# remove colunas
#df.drop(columns=['nome, sobrenome'], inplace=True)

In [None]:
# para reverter o que foi feito acima:
#df[['first', 'last']] = df['full_name'].str.split(' ', expand=True)

In [None]:
# remove uma linha (by index)
#df.drop(index=4, inplace=True)

# remove uma linha (condicional)
#df.drop(index=df[df['last'] == 'Doe'].index)

Vejamos se eu consigo exibir salários em ordem decrescente:

In [46]:
salarios = df.sort_values(by=['ConvertedComp', 'Country'], ascending=[False,False])
salarios[['Country', 'ConvertedComp']]

Unnamed: 0_level_0,Country,ConvertedComp
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1
58,United States,2000000.0
102,United States,2000000.0
166,United States,2000000.0
436,United States,2000000.0
452,United States,2000000.0
...,...,...
88062,,
88076,,
88601,,
88802,,
