Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

why a lot of @@ in the data #12

Closed
yeliu918 opened this issue May 31, 2020 · 0 comments
Closed

why a lot of @@ in the data #12

yeliu918 opened this issue May 31, 2020 · 0 comments

Comments

@yeliu918
Copy link

Hi,

I notice that there a lot of @@ in the data. For example, "Gut@@ ach : Incre@@ ased safety for pedestri@@ ans". It seems like that "Incre@@ ased" means "Increased". Should we revise the file such that deleting the @@ and combine two tokens to one token? I think for the preprocess.py ignore such a problem. And create the dictionary that contains a lot of words that have "@@".

Best,
Ye

@yeliu918 yeliu918 closed this as completed Jun 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant