New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lang_links property to WikipediaPage #45
Conversation
This property method returns a list of links to versions of the Wikipedia page in different languages. The result is a list of (language_prefix, title) tuples.
Whouldn't be better to be a list of (language_prefix, page_id, (optionan = titles)) tuples? |
Um - could you explain why it'd be better? I chose (language_prefix, title) to be consistent with the `links' property method, which returns page titles rather than page IDs. But I'm open to suggestions :) |
For important articles with many languages, is less probable that a encoding error appears due to a "title request". Numbers are clear and for sure, more direct and easier to handle for wikipedia for future requests. I think you should provide both, titles and page ids. What do you think? EDIT: before your contribution I developed a small function to obtain the langlinks.
|
So we have to make an API request for each (language, title) to get the corresponding page ID? For a popular page like "Wikipedia", that means >250 requests in this method. That's pretty expensive, especially if the user doesn't need to load the page in all of the languages. On the other hand, to get the page ID, I'd be doing
If the caller actually does want to access the content of the page in different languages, they will have to do the following:
In other words, they'd have to send a duplicate request for each language following this method. An alternative is to return WikipediaPage objects from langlinks; however, since each of the returned WikipediaPage objects will have a different language, but _wiki_requests() always uses the global singleton language setting (API_URL), none of the property methods on the returned WikipediaPage objects would work. |
Or, we can refactor WikipediaPage to be language-aware - i.e., we can add a `lang' field to WikipediaPage, and change page(), _wiki_request(), and WikipediaPage property methods to accept lang as an argument, rather than relying on a global language setting. That would also fix the following problem which we have today:
It'd be cleaner, but it'd be a much bigger change... |
Alright, I just did some refactoring and pushed out a new version so I'm not sure if this PR will still work. Thanks for the awesome discussion @jichuan89 and @javierprovecho so far, I think that we should definitely add I18N support more formally now (the first version of support was pretty hacky, as you can tell). I'm definitely in favor of turning So, here's my full proposal (please let me know what you think):
def explicit_settings(fn):
SETTINGS_KEYS = ('api_lang', 'rate_limit', 'rate_limit_min_wait', 'user_agent')
def wrapped_fn(*args, **kwargs):
global SETTINGS
for key in kwargs:
if key not in SETTINGS_KEYS:
continue
# usually this will just be an assignment, but
# for rate limiting there is some state to modify as well
set_setting(key, kwargs.pop(key), SETTINGS)
kwargs['SETTINGS'] = SETTINGS
return fn(*args, **kkwargs)
return wrapped_fn Then, if we apply that to each of the relevant What do you guys think? |
This change adds a lang_links property method to WikipediaPage, which returns a list of links to versions of the Wikipedia page in different languages. The result is a list of (language_prefix, title) tuples.