-
Notifications
You must be signed in to change notification settings - Fork 0
/
new_or_used.py
64 lines (49 loc) · 2.04 KB
/
new_or_used.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
"""
Exercise description
--------------------
In the context of Mercadolibre's Marketplace an algorithm is needed to
predict if an item listed in the markeplace is new or used.
Your task to design a machine learning model to predict if an item is new or
used and then evaluate the model over held-out test data.
To assist in that task a dataset is provided in `MLA_100k.jsonlines` and a
function to read that dataset in `build_dataset`.
For the evaluation you will have to choose an appropiate metric and also
elaborate an argument on why that metric was chosen.
The deliverables are:
- This file including all the code needed to define and evaluate a model.
- A text file with a short explanation on the criteria applied to choose
the metric and the performance achieved on that metric.
"""
#División aleatoria con train_test_split
import json
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder
# You can safely assume that `build_dataset` is correctly implemented
def build_dataset():
data = [json.loads(x) for x in open("MLA_100k.jsonlines")]
target = lambda x: x.get("condition")
N = -10000
X_train = data[:N]
X_test = data[N:]
y_train = [target(x) for x in X_train]
y_test = [target(x) for x in X_test]
for x in X_test:
del x["condition"]
return X_train, y_train, X_test, y_test
# preparar datos de entrada
def prepare_inputs (X_train, X_test):
oe = OrdinalEncoder()
oe.fit (X_train)
X_train_enc = oe.transform(X_train)
X_test_enc = oe.transform(X_test)
return X_train_enc, X_test_enc
#if __name__ == "__main__":
# print("Loading dataset...")
# Train and test data following sklearn naming conventions
# X_train (X_test too) is a list of dicts with information about each item.
# y_train (y_test too) contains the labels to be predicted (new or used).
# The label of X_train[i] is y_train[i].
# The label of X_test[i] is y_test[i].
# X_train, y_train, X_test, y_test = build_dataset()
# Insert your code below this line:
# ...