Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the meaning of [mg] in FilterEnglishTriplts.py #19

Closed
counten opened this issue Oct 19, 2020 · 4 comments
Closed

the meaning of [mg] in FilterEnglishTriplts.py #19

counten opened this issue Oct 19, 2020 · 4 comments

Comments

@counten
Copy link

counten commented Oct 19, 2020

I use this file to filter the freebase dump, but it filter all, why do you use "[mg]" in the regx pattern

@lanyunshi
Copy link
Owner

Hi,

Are you using the raw Freebase dump? "[mg] in the regx pattern" indicates that we consider the triplets with [mg] mid, which is the format of the entities in Freebase. For other language triplets, we filter them out.

Can you show me some triplets in your raw dump?

Best,
Yunshi

@counten
Copy link
Author

counten commented Oct 20, 2020

yes, I download freebase dump from official website, and these are some lines at the head of freebase-rdf-dump. I do not find [mg]

<http://rdf.freebase.com/ns/award.award_winner>	<http://rdf.freebase.com/ns/type.type.instance>	<http://rdf.freebase.com/ns/m.01v3v75>	.
<http://rdf.freebase.com/ns/award.award_winner>	<http://rdf.freebase.com/ns/type.type.instance>	<http://rdf.freebase.com/ns/m.05xzqqh>	.
<http://rdf.freebase.com/ns/award.award_winner>	<http://rdf.freebase.com/ns/type.type.instance>	<http://rdf.freebase.com/ns/m.0jvby5>	.
<http://rdf.freebase.com/ns/award.award_winner>	<http://rdf.freebase.com/ns/type.type.instance>	<http://rdf.freebase.com/ns/m.0jzfb6>	.
<http://rdf.freebase.com/ns/award.award_winner>	<http://rdf.freebase.com/ns/type.type.instance>	<http://rdf.freebase.com/ns/m.0y_4w3_>	.
<http://rdf.freebase.com/ns/award.award_winner>	<http://rdf.freebase.com/ns/type.type.instance>	<http://rdf.freebase.com/ns/m.04gh45>	.
<http://rdf.freebase.com/ns/award.award_winner>	<http://rdf.freebase.com/ns/type.type.instance>	<http://rdf.freebase.com/ns/m.0p9973n>	.
<http://rdf.freebase.com/ns/award.award_winner>	<http://rdf.freebase.com/ns/type.type.instance>	<http://rdf.freebase.com/ns/m.05jvs8>	.
<http://rdf.freebase.com/ns/award.award_winner>	<http://rdf.freebase.com/ns/type.type.instance>	<http://rdf.freebase.com/ns/m.0404b3s>	.

@lanyunshi
Copy link
Owner

The samples that you provided will be filtered out as it doesn't match any of the regex patterns. The triplet below (I just give a fake example) will match the first regex pattern ''re_ns_ns'' (this pattern will match the triplet whose head entity is a mid and tail entity is also a mid). It's strange that you obtain nothing after running the code, maybe you can investigate the code by figuring out which unexcepted pattern is matched for a certain triplet

http://rdf.freebase.com/ns/m.01v3v75 http://rdf.freebase.com/ns/type.type.instance http://rdf.freebase.com/ns/m.01v3v75 .

@counten
Copy link
Author

counten commented Oct 21, 2020

Thanks a lot, I finally find out the problem , the regex pattern from you works fine.

@counten counten closed this as completed Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants