**03: Indexes - Set, Reset and Use Indexes**
- Essentially names for rows, if the default indexes insufficiently represent the information

In [9]:
import pandas as pd

In [11]:
students = {
    'names': ['Tom', 'Bob', 'Jane', 'May'],
    'age': [9, 10, 10, 9],
    'subjects': ['Science', 'Arts', 'Hybrid', 'Arts'],
    'award winner': [True, False, False, True]
}

df = pd.DataFrame(students)
df

Unnamed: 0,names,age,subjects,award winner
0,Tom,9,Science,True
1,Bob,10,Arts,False
2,Jane,10,Hybrid,False
3,May,9,Arts,True


^Indexes are the default values, with 1st row starting at 0.

***
Creating Indexes:
- For creating more comprehensive row names instead of just '0', '1', ...

In [60]:
df.set_index('names', inplace = True)
#replaces the original index with the 'names' column. 
#inplace = True argument modifies the original dataframe

df

Unnamed: 0_level_0,age,subjects,award winner
names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Tom,9,Science,True
Bob,10,Arts,False
Jane,10,Hybrid,False
May,9,Arts,True


In [30]:
df.loc['Tom']
#actual use of loc method, to grab data using label instead of index which could be harder.

age                   9
subjects        Science
award winner       True
Name: Tom, dtype: object

In [62]:
df.reset_index(inplace=True)
#caution: running this multiple times will create a separate index column
#df.drop(columns=['index'], inplace=True)
df

Unnamed: 0,names,age,subjects,award winner
0,Tom,9,Science,True
1,Bob,10,Arts,False
2,Jane,10,Hybrid,False
3,May,9,Arts,True


***
Real Data Section:
- Creating Indexes - The index can be set while reading the csv file
- Grabbing question details more conveniently with qname as the index
- Sorting the modified dataframe

In [70]:
response_df = pd.read_csv('data.csv', index_col = 'ResponseId')
#index_col arguments similar to .set_index, but creates the labels while reading

response_df.head(3)

Unnamed: 0_level_0,MainBranch,Age,Employment,RemoteWork,Check,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,TechDoc,...,JobSatPoints_6,JobSatPoints_7,JobSatPoints_8,JobSatPoints_9,JobSatPoints_10,JobSatPoints_11,SurveyLength,SurveyEase,ConvertedCompYearly,JobSat
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,I am a developer by profession,Under 18 years old,"Employed, full-time",Remote,Apples,Hobby,Primary/elementary school,Books / Physical media,,,...,,,,,,,,,,
2,I am a developer by profession,35-44 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,API document(s) and/or SDK document(s);User gu...,...,0.0,0.0,0.0,0.0,0.0,0.0,,,,
3,I am a developer by profession,45-54 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,API document(s) and/or SDK document(s);User gu...,...,,,,,,,Appropriate in length,Easy,,


In [72]:
schema_df = pd.read_csv('schema_data.csv', index_col = 'qname')
pd.set_option('display.max_rows', 114)
schema_df

Unnamed: 0_level_0,qid,question,force_resp,type,selector
qname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
MainBranch,QID2,Which of the following options best describes ...,True,MC,SAVR
Age,QID127,What is your age?*,True,MC,SAVR
Employment,QID296,Which of the following best describes your cur...,True,MC,MAVR
RemoteWork,QID308,Which best describes your current work situation?,False,MC,SAVR
Check,QID341,Just checking to make sure you are paying atte...,True,MC,SAVR
CodingActivities,QID297,Which of the following best describes the code...,False,MC,MAVR
EdLevel,QID25,Which of the following best describes the high...,True,MC,SAVR
LearnCode,QID276,How do you learn to code? Select all that apply.,False,MC,MAVR
LearnCodeOnline,QID281,What online resources do you use to learn to c...,False,MC,MAVR
TechDoc,QID331,What is the source of the technical documentat...,False,MC,MAVR


In [76]:
#Let's say we are trying to figure out what 'Frustration' means in the survey
schema_df.loc['Frustration', 'question']
#recall, [row, column]

'Which of these company challenges causes you the most frustration? Select all that apply.'

In [80]:
schema_df.sort_index()
#ascending=False for descending order

Unnamed: 0_level_0,qid,question,force_resp,type,selector
qname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AIAcc,QID316,How much do you trust the accuracy of the outp...,False,MC,SAVR
AIBen,QID324,For the AI tools you use as part of your devel...,False,MC,MAVR
AIChallenges,QID346,What are the challenges to your company/whole ...,False,MC,MAVR
AIComplex,QID343,How well do the AI tools you use in your devel...,False,MC,SAVR
AIEthics,QID339,Which AI ethical responsibilities are most imp...,False,MC,MAVR
AINext,QID320,Thinking about how your job and process change...,False,Matrix,Likert
AISearchDev,QID327,Which <b>AI-powered search and developer tools...,False,Matrix,Likert
AISelect,QID314,Do you currently use AI tools in your developm...,True,MC,SAVR
AISent,QID315,How favorable is your stance on using AI tools...,False,MC,SAVR
AIThreat,QID338,Do you believe AI is a threat to your current ...,False,MC,SAVR


In [82]:
schema_df.head(4)

Unnamed: 0_level_0,qid,question,force_resp,type,selector
qname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
MainBranch,QID2,Which of the following options best describes ...,True,MC,SAVR
Age,QID127,What is your age?*,True,MC,SAVR
Employment,QID296,Which of the following best describes your cur...,True,MC,MAVR
RemoteWork,QID308,Which best describes your current work situation?,False,MC,SAVR


Notice the original schema_df dataframe is unsorted since inplace=True is not passed