Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Add get_ken_embeddings example #602

Merged
merged 17 commits into from
Jun 15, 2023

Conversation

jovan-stojanovic
Copy link
Member

Fix for #578

Copy link
Member

@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments

>>> games_embedding = get_ken_embeddings(types="video_games")
>>> games_embedding.head()
Entity Type ... X198 X199
0 <The_Mysterious_Island> <wikicat_novels_adapted_into_video_games> ... -0.072814 -0.156973
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand: we still have the "<" here, it seems

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, corrected, after I merged with #601

@@ -99,6 +99,7 @@ def get_ken_types(
search_result = unique_types.X[unique_types.X["Type"].str.contains(search)]
if exclude is not None:
search_result = search_result[~search_result["Type"].str.contains(exclude)]
search_result["Type"] = search_result["Type"].str.replace("<", "").str.replace(">", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to do ".str[1:-1]": what if there was a "<" inside the string (maybe it's impossible?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are extracted from Wikipedia URL's, thus we have the "_" instead of spaces.
And having ">" or "<" symbols is apparently not possible:
https://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid#1547940

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jun 15, 2023 via email

@jovan-stojanovic
Copy link
Member Author

I also have a doubt about the name of the types parameter of the get_ken_embeddings function. People may not understand that we are actually performing a search. I would rather rename it to types_search or search.
I guess it's now or never after the release :)
WDYT?

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jun 15, 2023 via email

@GaelVaroquaux
Copy link
Member

There's a conflict :(

Also: don't we need to do the renaming also in the api docs?

@GaelVaroquaux
Copy link
Member

LGTM. Merging

@GaelVaroquaux GaelVaroquaux merged commit 8c83949 into skrub-data:main Jun 15, 2023
17 of 18 checks passed
@jovan-stojanovic jovan-stojanovic deleted the add_ken_fetch_example branch July 21, 2023 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants