Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add modeling for intersphinx data #5289

Merged
merged 34 commits into from Feb 27, 2019
Merged

Add modeling for intersphinx data #5289

merged 34 commits into from Feb 27, 2019

Conversation

@ericholscher
Copy link
Member

@ericholscher ericholscher commented Feb 13, 2019

This adds modeling for Intersphinx data. It will be automatically updated on build, so that we have it indexed. The goal here will be to add this to search results, but I broke up the PR's to make them easier to review and merge.

Closes #4767

@ericholscher ericholscher requested a review from Feb 13, 2019
@safwanrahman
Copy link
Member

@safwanrahman safwanrahman commented Feb 15, 2019

I was thinking about relating the domain data with HTMLFile object. @ericholscher what do you think?

readthedocs/domaindata/models.py Outdated Show resolved Hide resolved
)
version = models.ForeignKey(Version, verbose_name=_('Version'),
related_name='domain_data')
modified_date = models.DateTimeField(_('Publication date'), auto_now=True)
Copy link
Member

@stsewd stsewd Feb 18, 2019

We could start using this standard #4864

version = models.ForeignKey(Version, verbose_name=_('Version'),
related_name='domain_data')
modified_date = models.DateTimeField(_('Publication date'), auto_now=True)
commit = models.CharField(_('Commit'), max_length=255)
Copy link
Member

@stsewd stsewd Feb 18, 2019

Do we really need this? I think we already have this from the version 🤔

Copy link
Member Author

@ericholscher ericholscher Feb 20, 2019

Yes, a file can be removed from a specific commit, and we need to track this to delete it.

'''

@property
def doc_type(self):
Copy link
Member

@stsewd stsewd Feb 18, 2019

Not sure if this is the correct name, I wasn't able to find something that could be used instead. I mean, this is just the full object not, the type of document.

Copy link
Member

@humitos humitos Feb 21, 2019

This is more like a "role name" or "directive name" in the Sphinx language as I can read from their docs. Considering that this is used as .. py:function:: or py:function:'spam'.

Maybe "role_name" is a better name.

readthedocs/projects/tasks.py Outdated Show resolved Hide resolved
for name, einfo in sorted(invdata[key].items()):
url = einfo[2]
if '#' in url:
doc_name, anchor = url.split('#')
Copy link
Member

@stsewd stsewd Feb 18, 2019

We could use urlparse https://docs.python.org/3/library/urllib.parse.html, but the logic here is very simple, so not really sure.

Copy link
Member Author

@ericholscher ericholscher Feb 20, 2019

Yea, feels unnecessary. It's also not a full URL, so it might be confused by it.

@ericholscher
Copy link
Member Author

@ericholscher ericholscher commented Feb 20, 2019

I was thinking about relating the domain data with HTMLFile object. @ericholscher what do you think?

@safwanrahman Can you say a bit more about this? They aren't representing a file that we have on disk, so I don't think it's a good match. It could be useful to optionally have them point to an HTMLFile on the model, but I'm not sure what the value would be?

@ericholscher ericholscher requested a review from Feb 20, 2019
Copy link
Member

@humitos humitos left a comment

Looks good and I think this has potential. Although, I'm not 100% sure how or where you are thinking to use this.

I left some Python style comments that I think can be considered.

As I understand this PR, we are converting the objects.inv into data on our db. If I'm correct, won't this be a huge amount of data going to our db that could potentially cause an issue?

from .models import DomainData


class DomainDataAdmin(admin.ModelAdmin):
Copy link
Member

@humitos humitos Feb 21, 2019

DomainData name conflicts with our Domain model to me.

Considering that we are talking about Sphinx Domains here, I'd name all of these ones (including the Django App) as SphinxDomain. I think it will be better to get the meaning immediately when re reading about this.


project = models.ForeignKey(
Project,
related_name='domain_data',
Copy link
Member

@humitos humitos Feb 21, 2019

Same here, we will have project.domains and project.domain_data. I think it will be clearer as project.domains (current relationship) and project.sphinx_domains (or project.sphinxdomains), IMO.

def __str__(self):
return f'''
DomainData [{self.project.slug}:{self.version.slug}]
[{self.domain}:{self.type}] {self.name} -> {self.doc_name}#{self.anchor}
Copy link
Member

@humitos humitos Feb 21, 2019

This should probably be wrapped.

'''

@property
def doc_type(self):
Copy link
Member

@humitos humitos Feb 21, 2019

This is more like a "role name" or "directive name" in the Sphinx language as I can read from their docs. Considering that this is used as .. py:function:: or py:function:'spam'.

Maybe "role_name" is a better name.

return f'{self.domain}:{self.type}'

@property
def doc_url(self):
Copy link
Member

@humitos humitos Feb 21, 2019

I'd name it as docs_url to follow the same pattern that we have in other places.

invdata = intersphinx.fetch_inventory(MockApp(), '', object_file)
for key in sorted(invdata or {}):
domain, _type = key.split(':')
for name, einfo in sorted(invdata[key].items()):
Copy link
Member

@humitos humitos Feb 21, 2019

Isn't it clearer to use .items() in the first loop here?

for key, value in invdata.items():
    domain, _type = key.split(':')
    for name, einfo in value:

log.warning('Sphinx MockApp: %s', msg)

invdata = intersphinx.fetch_inventory(MockApp(), '', object_file)
for key in sorted(invdata or {}):
Copy link
Member

@humitos humitos Feb 21, 2019

Why these two loops need to be sorted?

for key in sorted(invdata or {}):
domain, _type = key.split(':')
for name, einfo in sorted(invdata[key].items()):
url = einfo[2]
Copy link
Member

@humitos humitos Feb 21, 2019

I'd add a small comment with previous to this line, with the expected structure of this einfo object. Like

# (key, name, url, doc type, etc)
# ('py', 'function', '/some-nice-url/python-function/', 'func', ...)

or... a link to the docs.

I found debugging this complicated later.

else:
doc_name, anchor = url, ''
display_name = einfo[3]
obj, _ = DomainData.objects.get_or_create(
Copy link
Member

@humitos humitos Feb 21, 2019

I think this will fail the first time because commit is not passed here and it's not marked as null=True.

'name',
'display_name',
'doc_type',
'doc_url',
Copy link
Member

@humitos humitos Feb 21, 2019

I suppose this is the most important field we want for the use case you mentioned, as we will want to embed the section when hovering with the mouse or something like that, right?

@safwanrahman
Copy link
Member

@safwanrahman safwanrahman commented Feb 22, 2019

I was thinking about relating the domain data with HTMLFile object. @ericholscher what do you think?

@safwanrahman Can you say a bit more about this? They aren't representing a file that we have on disk, so I don't think it's a good match. It could be useful to optionally have them point to an HTMLFile on the model, but I'm not sure what the value would be?

@ericholscher If each of the domain data could pointed to a HTMLFile object, we can index the HTMLFile and DomainData object together in Elasticsearch. So while searching, we can show results based on the page and the section.
In current situation, as there are no relation among them, we need to index them separately and its not possible to show a user about the subsection in search results.

@@ -364,7 +364,7 @@ def setUp(self):
'api_webhook_gitlab': {'status_code': 405},
'api_webhook_bitbucket': {'status_code': 405},
'api_webhook_generic': {'status_code': 403},
'domaindata-detail': {'status_code': 404},
'sphinx_domains-detail': {'status_code': 404},
Copy link
Member

@humitos humitos Feb 25, 2019

I think it doesn't matter too much, but to keep consistency here with the other endpoints, I'd say that it should be called sphinxdomains-detail (without the _).

Copy link
Member Author

@ericholscher ericholscher Feb 25, 2019

This is autogenerated by REST framework.

Copy link
Member

@humitos humitos Feb 27, 2019

Copy link
Member Author

@ericholscher ericholscher Feb 27, 2019

Agreed. It also shouldn't be plural. Fixed.

@ericholscher ericholscher requested review from humitos, stsewd and and removed request for humitos Feb 26, 2019
@ericholscher
Copy link
Member Author

@ericholscher ericholscher commented Feb 26, 2019

Would love to get this merged for deploy this week as well, if folks have time to review.

readthedocs/projects/tasks.py Outdated Show resolved Hide resolved
Copy link
Member

@humitos humitos left a comment

It's ready to merge to me.

Just wanted to re-raise my comment about consistency on the DRF URLs. I'm not blocking on that, anyway but I think it worth to change it if possible.

@@ -364,7 +364,7 @@ def setUp(self):
'api_webhook_gitlab': {'status_code': 405},
'api_webhook_bitbucket': {'status_code': 405},
'api_webhook_generic': {'status_code': 403},
'domaindata-detail': {'status_code': 404},
'sphinx_domains-detail': {'status_code': 404},
Copy link
Member

@humitos humitos Feb 27, 2019

@ericholscher ericholscher dismissed stale reviews from humitos and stsewd via 4ea9121 Feb 27, 2019
ericholscher and others added 2 commits Feb 27, 2019
Co-Authored-By: ericholscher <25510+ericholscher@users.noreply.github.com>
stsewd
stsewd approved these changes Feb 27, 2019
@ericholscher ericholscher merged commit 1fd489c into master Feb 27, 2019
1 check passed
@delete-merged-branch delete-merged-branch bot deleted the intersphinx-modeling branch Feb 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants