about the wide feature and deep feature #3

Usernamezhx · 2019-09-19T12:28:33Z

first of all. thanks for your work. I am a little confuse about the split of the wide and deep feature. embeddings_cols, continuous_cols, standardize_cols, wide_cols, crossed_cols, already_dummies. embeddings_cols is the category feature. continuous_cols is continuous feature. what's the different between standardize_cols and continuous_cols? the wide_cols feature is similar to the category feature. crossed_cols I know you want to get the interaction feature.how to select the element to constitute interaction feature? thanks in advance.

jrzaurin · 2019-09-19T15:12:10Z

Hi @Usernamezhx

ok, let me explain. If you go here:
https://github.com/jrzaurin/Wide-and-Deep-PyTorch/blob/master/prepare_data.py#L52

you will read the following:

    embeddings_cols: List
        List containing just the name of the columns that will be represented
        with embeddings or a Tuple with the name and the embedding dimension.
        e.g.:  [('education',32), ('relationship',16)
    continuous_cols: List
        List with the name of the so called continuous cols
    standardize_cols: List
        List with the name of the continuous cols that will be Standarised.
        Only included because the Airbnb dataset includes Longitude and
        Latitude and does not make sense to normalise that

The functions in prepare_data.py are highly customised to the problem in particular. So, given this input, and for the airbnb dataset:

continuous_cols = ['latitude', 'longitude', 'security_deposit', 'extra_people']
standardize_cols = ['security_deposit', 'extra_people']

what will happen is that while 'security_deposit', 'extra_people' will be standarised, 'latitude', 'longitude' will not (because it does not make sense.

Regarding to the other column-type inputs, if you go here:
https://github.com/jrzaurin/Wide-and-Deep-PyTorch/blob/master/prepare_data.py#L128

you will read the following:

    wide_cols: List
        List with the name of the columns that will be one-hot encoded and
        pass through the Wide model
    crossed_cols: List
        List of Tuples with the name of the columns that will be "crossed"
        and then one-hot encoded. e.g. (['education', 'occupation'], ...)
    already_dummies: List
        List of columns that are already dummies/one-hot encoded

The wide columns are normally one-hot encoded and then pass through the model. However, there might be some columns that are already one hot encoded, and I call them already_dummies.

And regarding to your last question: "how to select the element to constitute interaction feature?" The answer is that you have to experiment, there is no rule for that. For example, if you have a couple of features and you think that including their relation might add useful information, then is probably useful if you "cross them". For example, directly from the tensorflow tutorials: "...If you have a feature 'favorite_sport' and a feature 'home_city' and you're trying to predict whether a person likes to wear red, your linear model won't be able to learn that baseball fans from St. Louis especially like to wear red..."

Let me know if this helps

Usernamezhx · 2019-09-20T06:24:42Z

Thank you very much for your patiently reply.

Usernamezhx closed this as completed Sep 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about the wide feature and deep feature #3

about the wide feature and deep feature #3

Usernamezhx commented Sep 19, 2019 •

edited

Loading

jrzaurin commented Sep 19, 2019

Usernamezhx commented Sep 20, 2019

about the wide feature and deep feature #3

about the wide feature and deep feature #3

Comments

Usernamezhx commented Sep 19, 2019 • edited Loading

jrzaurin commented Sep 19, 2019

Usernamezhx commented Sep 20, 2019

Usernamezhx commented Sep 19, 2019 •

edited

Loading