# Quiz CSV Loader

This notebook is used to download CSVs from Google Cloud Storage, perform some processing, and insert the data into a MySQL database.

The format of the CSV is expected to look like this:

```
,Question,Answer,Correct
,Which regression model uses the L1 regularization technique?,Ridge Regression,FALSE
,,Lasso Regression,TRUE
,,Both A and B,FALSE
,,None of the above,FALSE
,What would you use to solve the problem of overfitting?,Regularization technique,FALSE
,,Cross Validation,FALSE
,,Drop out,FALSE
,,All of the above,TRUE
```

Note that the first column is empty in all cases. If the second column (the `Question` column) is empty, it's implied that the row contains an answer pertaining to the last non-empty `Question` value.

In [3]:
# First, load the CSV locally.
!mkdir -p csvs
!gsutil cp gs://ml-quiz-project-csvs/* csvs/.

Copying gs://ml-quiz-project-csvs/art-and-science-of-ml.csv...
/ [1 files][ 17.5 KiB/ 17.5 KiB]                                                
Operation completed over 1 objects/17.5 KiB.                                     


In [29]:
# Import ribraries
import pandas as pd
import uuid

In [18]:
df = pd.read_csv("csvs/art-and-science-of-ml.csv")
df = df[["Question", "Answer", "Correct"]]
df = df[~df.Answer.isnull()]
df.Question.ffill(inplace=True)
# df['question_id'] = df['Question'].apply(lambda x: x.ffill().shift(1))
df

Unnamed: 0,Question,Answer,Correct
0,Which regression model uses the L1 regularizat...,Ridge Regression,False
1,Which regression model uses the L1 regularizat...,Lasso Regression,True
2,Which regression model uses the L1 regularizat...,Both A and B,False
3,Which regression model uses the L1 regularizat...,None of the above,False
4,What would you use to solve the problem of ove...,Regularization technique,False
...,...,...,...
145,Which of the following statements is true abou...,"For a logit layer, similar to a single logit, ...",False
146,Which of the following statement is incorrect?,The Categorical columns are represented by ten...,False
147,Which of the following statement is incorrect?,Tensorflow can do math operations on sparse te...,False
148,Which of the following statement is incorrect?,"The more dimensions you have, the greater chan...",False


In [45]:
unique_questions = pd.Series(df.Question.unique())
questions = pd.DataFrame({"question_id": unique_questions.apply(lambda x: uuid.uuid4()), "question": unique_questions})
joined_table = pd.merge(questions, df, left_on='question', right_on='Question')
joined_table['answer_id'] = [uuid.uuid4() for _ in range(len(df.index))]
answers = joined_table[['question_id', 'answer_id', 'Answer', 'Correct']]
answers = answers.rename({'Answer': 'answer', 'Correct': 'correct'}, axis='columns')
questions

Unnamed: 0,question_id,question
0,7a11d277-4a54-44c0-af3e-203c1e1b7103,Which regression model uses the L1 regularizat...
1,2647ca03-14ac-47c3-ae02-8d1337f5c7af,What would you use to solve the problem of ove...
2,883f5170-e629-46ca-8ffe-e0c36c9f6645,Why would you use the square of the L2 norm?
3,13d058ca-eaf5-4ef3-a7ad-aa0b3469b42c,Which regression model uses the L2 regularizat...
4,50808aa4-d80f-4868-a700-38044e5bc956,Why is regularization useful?
5,edde16af-6587-4c1a-98b1-0e5ebd3ee555,Which of the following is not true about the L...
6,77eff38f-f4c9-4b86-ab3e-e754132e1004,Which of the following is an example of a hype...
7,b0ba7d34-aa07-42a5-ab71-0b191cb02f1b,How would you use the Cloud AI Platform Traini...
8,bb0ea1fe-44e6-4960-8c1f-f3b93fac4e80,How would you ensure the outputs of different ...
9,c996a3b4-ef33-4f11-a119-03664d6a75be,How do you supply hyperparameters to the train...


In [47]:
questions.head(n=10)

Unnamed: 0,question_id,question
0,7a11d277-4a54-44c0-af3e-203c1e1b7103,Which regression model uses the L1 regularizat...
1,2647ca03-14ac-47c3-ae02-8d1337f5c7af,What would you use to solve the problem of ove...
2,883f5170-e629-46ca-8ffe-e0c36c9f6645,Why would you use the square of the L2 norm?
3,13d058ca-eaf5-4ef3-a7ad-aa0b3469b42c,Which regression model uses the L2 regularizat...
4,50808aa4-d80f-4868-a700-38044e5bc956,Why is regularization useful?
5,edde16af-6587-4c1a-98b1-0e5ebd3ee555,Which of the following is not true about the L...
6,77eff38f-f4c9-4b86-ab3e-e754132e1004,Which of the following is an example of a hype...
7,b0ba7d34-aa07-42a5-ab71-0b191cb02f1b,How would you use the Cloud AI Platform Traini...
8,bb0ea1fe-44e6-4960-8c1f-f3b93fac4e80,How would you ensure the outputs of different ...
9,c996a3b4-ef33-4f11-a119-03664d6a75be,How do you supply hyperparameters to the train...


In [48]:
answers.head(n=10)

Unnamed: 0,question_id,answer_id,answer,correct
0,7a11d277-4a54-44c0-af3e-203c1e1b7103,3d76e3ea-ed64-4756-92f6-14c555659dc0,Ridge Regression,False
1,7a11d277-4a54-44c0-af3e-203c1e1b7103,ab4cb843-938c-4cac-95a0-7624d7f43f80,Lasso Regression,True
2,7a11d277-4a54-44c0-af3e-203c1e1b7103,b7f161e6-9963-4fa0-bdb6-771149f91ec4,Both A and B,False
3,7a11d277-4a54-44c0-af3e-203c1e1b7103,33e8f168-d9a6-4842-bcf7-2cec2bbef8ff,None of the above,False
4,2647ca03-14ac-47c3-ae02-8d1337f5c7af,14331346-70fa-4cd8-b88e-100695dd48d8,Regularization technique,False
5,2647ca03-14ac-47c3-ae02-8d1337f5c7af,00a55669-eb9f-4ec7-a324-f43fa2242a62,Cross Validation,False
6,2647ca03-14ac-47c3-ae02-8d1337f5c7af,079bf31c-be7d-4eda-a5e1-9e64577fdc45,Drop out,False
7,2647ca03-14ac-47c3-ae02-8d1337f5c7af,28822016-2ab9-40ed-93c1-7b977c464f7a,All of the above,True
8,883f5170-e629-46ca-8ffe-e0c36c9f6645,c0177bfd-0e28-43a4-a2f5-3ddf1fa096fa,To increase the calculation of derivatives,False
9,883f5170-e629-46ca-8ffe-e0c36c9f6645,47172bfb-cc1f-4ab0-89c8-082b55315422,To minimize the training error,False


In [51]:
answers.dtypes

question_id    object
answer_id      object
answer         object
correct          bool
dtype: object