-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing Dataset #4
Comments
The document ids are either unique IDs provided by the data vendor or they can be incremental IDs. If you have a CSV file with no other unique identifiers, you can save the row numbers as the document IDs. |
i dont have a csv file all i have is the data |
i have a ticker to differentiate different companies. But in your csv files one document has multiple document ids and i dont understand how a document has been broken down. |
One input document corresponds to one unique id. The number of rows in document file is the same as the document-id file. |
the document.txt in the input folder contains several documents right? and each line has a unique id okay. And also each document has a unique id. How does it differentiate between different documents in that plethora of text. |
Each line in document.txt is a unique document with line breaks removed. |
okay thank you. |
I wanted to change the data set but am unable to understand how you have mapped document_ids to the documents. A little clarification of that in readme.md would be really helpful.
Thank you.
The text was updated successfully, but these errors were encountered: