Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples: provide an example based a real dataset #125

Closed
loretoparisi opened this issue Apr 11, 2016 · 7 comments
Closed

Examples: provide an example based a real dataset #125

loretoparisi opened this issue Apr 11, 2016 · 7 comments
Labels

Comments

@loretoparisi
Copy link

I'm trying to figure out a real-world example with tpot to learn how it works, step by step.

So, I will take this dataset as example, that looks like

{
    "laughter-good-conversation-friendships": {
        "category": "socializing",
        "category_id": 31,
        "score": 2.5,
        "topic_id": 27166
    },
    "live-theatre": {
        "category": "socializing",
        "category_id": 31,
        "score": 2.5,
        "topic_id": 23274
    },
    "nightlife": {
        "category": "socializing",
        "category_id": 31,
        "score": 2.5,
        "topic_id": 4392
    },
    "wine-tasting": {
        "category": "socializing",
        "category_id": 31,
        "score": 2.5,
        "topic_id": 15139
    },
    "women": {
        "category": "socializing",
        "category_id": 31,
        "score": 2.5,
        "topic_id": 10232
    },
    "business-entrepreneur-networking": {
        "category": "career-business",
        "category_id": 2,
        "score": 0,
        "topic_id": 20060
    },
    "business-referral-networking": {
        "category": "career-business",
        "category_id": 2,
        "score": 0,
        "topic_id": 15405
    },
    "christian": {
        "category": "career-business",
        "category_id": 2,
        "score": 0,
        "topic_id": 17963
    },
    "financial-chaos-to-financial-freedom": {
        "category": "career-business",
        "category_id": 2,
        "score": 0,
        "topic_id": 79053
    },
    "insprofs": {
        "category": "career-business",
        "category_id": 2,
        "score": 0,
        "topic_id": 10042
    },
    "professional-networking": {
        "category": "career-business",
        "category_id": 2,
        "score": 0,
        "topic_id": 15720
    },
    "real-estate-investors": {
        "category": "career-business",
        "category_id": 2,
        "score": 0,
        "topic_id": 16165
    },
    "retirement": {
        "category": "career-business",
        "category_id": 2,
        "score": 0,
        "topic_id": 6364
    },
    "socialnetwork": {
        "category": "career-business",
        "category_id": 2,
        "score": 0,
        "topic_id": 4422
    }
}

each item in this json is a topic like socialnetwork that belongs to a category like career-business. More topics may belong to the same category, like wine-tasting and nightlife belongs to socializing, son in a 1 to N relationship.
Now, the are some topics that may affer to the same topic like here:

"dragons": {
        "category": "lifestyle",
        "category_id": 17,
        "score": 0.279108277990115,
        "topic_id": 2137
    },
    "furry-fandom": {
        "category": "lifestyle",
        "category_id": 17,
        "score": 0.279108277990115,
        "topic_id": 48595
    },
    "legendarycreatures": {
        "category": "lifestyle",
        "category_id": 17,
        "score": 0.279108277990115,
        "topic_id": 10523
    }

Topics like dragons and legendarycreatures could be considered part of a new topic like fantasy.
So, assumed that tpot can analyze dataset automatically, how to describe this dataset in order that our objective function is to collect / aggregate topics in new topics, so new categories that are not defined in this dataset?

@loretoparisi loretoparisi changed the title To provide a simple dataset example To provide an example on a real dataset Apr 11, 2016
@loretoparisi loretoparisi changed the title To provide an example on a real dataset To provide a working example on a real dataset Apr 11, 2016
@loretoparisi loretoparisi changed the title To provide a working example on a real dataset Examples: provide an example based a real dataset Apr 11, 2016
@rhiever
Copy link
Contributor

rhiever commented Apr 11, 2016

Currently, TPOT only supports supervised classification tasks. Is what you're describing a supervised classification task, i.e., do you have a set of features with labels that you're attempted to model?

@loretoparisi
Copy link
Author

Ok thanks. Let's suppose that I have manually tagged features (i.e. in this case topics), so labelled them in order to belong to a most generic topic (new one) so let's say we have a test set with a initial classification in N categories/topics.

@rhiever
Copy link
Contributor

rhiever commented Apr 11, 2016

What features would the algorithm have to classify the topic with?

@loretoparisi
Copy link
Author

So, since I have no description of a topic then the label, I thought to set as metrics word2vec features vector for each label, so I have the distances basically.

@rhiever
Copy link
Contributor

rhiever commented Apr 11, 2016

Gotcha. If you can feed TPOT features such as those distances (or even the vector values from word2vec) and provide it labels for those features, then it can try to optimize a pipeline that maximizes classification accuracy for that data set.

@loretoparisi
Copy link
Author

@rhiever ok, which is the input data format, let's say a typical row I would have from the word2vec vector for a label is the array of floating point numbers, do this work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants