Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

turtle CURIE highlight differs if reference starts with number #1553

Closed
elf-pavlik opened this issue Sep 24, 2020 · 4 comments
Closed

turtle CURIE highlight differs if reference starts with number #1553

elf-pavlik opened this issue Sep 24, 2020 · 4 comments
Assignees

Comments

@elf-pavlik
Copy link
Contributor

@elf-pavlik elf-pavlik commented Sep 24, 2020

This screenshot should show the difference

image

It seems that references starting with number result in incorrect highlight

@elf-pavlik
Copy link
Contributor Author

@elf-pavlik elf-pavlik commented Sep 25, 2020

Just in case I include html of both snippets from the screenshot.

Figure 10

alice
<c- p="">:</c->
<c- mi="">6</c->
a
<c- mi="">86</c->
b
<c- mf="">e7</c->
b
<c- mi="">-3</c->
f
<c- mi="">60-4</c->
cc
<c- mi="">5-8</c->
ab
<c- mi="">9</c->
-f
<c- mi="">259693700</c->
d
<c- mi="">3</c->
<c- b="">a</c->
<c- nn="">interop:</c->
<c- f="">Registrar</c->
<c- p="">;</c->
<c- nn="">interop:</c->
<c- f="">hasRemoteDataRegistrySet</c->
<c- nn="">alice:</c->
<c- f="">ba4da3ec-dea4-41b2-be02-e4bf7a9477df</c->
<c- p="">.</c->

Figure 11

<c- nn="">alice:</c->
<c- f="">ba4da3ec-dea4-41b2-be02-e4bf7a9477df</c->
<c- b="">a</c->
<c- nn="">interop:</c->
<c- f="">RemoteDataRegistrySet</c->
<c- p="">;</c->
<c- nn="">interop:</c->
<c- f="">hasRegistry</c->
alice
<c- p="">:</c->
<c- mi="">6</c->
f
<c- mf="">6e4241</c->
<c- mi="">-75</c->
a
<c- mi="">2-4780-9</c->
b
<c- mi="">2</c->
a
<c- mi="">-40</c->
da
<c- mf="">53082e54</c->
<c- p="">.</c->

I have also tried changing first snippet from turtle to shex and in that case it highlighted correctly.
image

shex highlight of figure 10

<c- nn="">alice</c->
<c- p="">:</c->
<c- f="">6a86be7b-3f60-4cc5-8ab9-f259693700d3</c->
<c- k="">a</c->
<c- nn="">interop</c->
<c- p="">:</c->
<c- f="">Registrar</c->
<c- p="">;</c->
<c- nn="">interop</c->
<c- p="">:</c->
<c- f="">hasRemoteDataRegistrySet</c->
<c- nn="">alice</c->
<c- p="">:</c->
<c- f="">ba4da3ec-dea4-41b2-be02-e4bf7a9477df</c->
<c- p="">.</c->

I think those lines of code may be responsible for the difference

Turtle:

# PNAME_NS PN_LOCAL (with simplified character range)
patterns['PrefixedName'] = r'%(PNAME_NS)s([a-z][\w-]*)' % patterns

# PrefixedName
(r'%(PrefixedName)s' % patterns,
bygroups(Name.Namespace, Name.Tag)),

ShEx:

# prefixed names ::
(r'(' + PN_PREFIX + r')?(\:)(' + PN_LOCAL + ')?',

@Anteru Anteru self-assigned this Sep 27, 2020
@Anteru
Copy link
Collaborator

@Anteru Anteru commented Sep 27, 2020

Thanks, this looks like it could need a dedicated regex for the UUID. That said: Are UUIDs required in tutrle CURIE or is this just by chance? The regex looks like it's quite permissive in the sense something like a-a-a- would be also a valid name, is that expected?

@elf-pavlik
Copy link
Contributor Author

@elf-pavlik elf-pavlik commented Sep 27, 2020

UUID don't play any special role at all, I just happen to use them as random string to create unique IRIs in specific namespace. Problem seems to occur whenever part after colon : starts with a number.

I went to https://pygments.org/demo/#try and used this dummy snippet

@prefix ex: <https://ns.example/> .
@prefix alice: <https://alice.example/> .

alice:abc123 ex:whatever alice:123abc .

alice:123abc ex:whatever alice:abc123 .

alice:a-b-c ex:whatever alice:1-2-3 .

alice:abc-123 ex:whatever alice:abc_123 .

alice:123-abc ex:whatever alice:123_abc .

If I select language Turtle it has incorrect highlight for any CURIE which has number right after :. On the other hand changing language to 'ShExC' results in correct highlight. I believe that problem might be solved by just using the same regex for CURIEs in Turtle (I think called prefixed names in the lexer) that ShEx is using. In my previous comment I referenced relevant lines of code which seem to me relevant in both Turtle and ShEx.

EDIT

SPARQL also handles prefixed names which have number right after : correctly. It seems to use the same RegEx as ShEx.

# prefixed names ::
(r'(' + PN_PREFIX + r')?(\:)(' + PN_LOCAL + r')?',

@elf-pavlik
Copy link
Contributor Author

@elf-pavlik elf-pavlik commented Oct 12, 2020

I don't work with python so I don't feel confident with creating PR to fix described issue. Still I was poking around and I have this commit which aligns turtle lexer with shexc lexer on how they treat prefixed names. RDF lexers didn't have tests so I added identical one for shexc and turtle testing this specific case. elf-pavlik@9f29da0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants