Can we upload our own dataset? #4

sandro272 · 2019-07-25T18:33:09Z

Do you have scripts available/any easy way to convert raw data to your processed dataset files . So that i can test your on my own dataset .

svjan5 · 2019-07-25T18:41:58Z

Hi @sandro272,
By dataset you mean training dataset (wikipedia corpus) or evaluation data?

sandro272 · 2019-07-26T02:26:19Z

@svjan5 Uh... I mean that because I want to use our own dataset, so can you provide a script or method that converts raw data into your processed data (eg voc2id.txt, etc.). Thank you！

svjan5 · 2019-07-26T21:18:48Z

Ok, got it. Actually, I cannot give a script for that because it requires getting dependency parse of the text which requires Stanford CoreNLP. So, you first need to get a dependency parse of the text, then I think everything is quite straight forward. voc2id.txt will contain the mapping of tokens to their unique ids and data.txt contains listing of tokens and dependency parse edges for each sentence in the corpus. Let me know if you face any difficulty in the whole process.

sandro272 · 2019-07-27T00:37:18Z

@svjan5 OK,thank you!

sandro272 closed this as completed Jul 26, 2019

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we upload our own dataset? #4

Can we upload our own dataset? #4

sandro272 commented Jul 25, 2019

svjan5 commented Jul 25, 2019

sandro272 commented Jul 26, 2019

svjan5 commented Jul 26, 2019

sandro272 commented Jul 27, 2019

This comment has been minimized.

Can we upload our own dataset? #4

Can we upload our own dataset? #4

Comments

sandro272 commented Jul 25, 2019

svjan5 commented Jul 25, 2019

sandro272 commented Jul 26, 2019

svjan5 commented Jul 26, 2019

sandro272 commented Jul 27, 2019

This comment has been minimized.