-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to create train.csv, validation.csv, test.csv #19
Comments
The same here. |
@emes83 or maybe we need to create them on our own from the link https://www.researchgate.net/publication/251231364_FinancialPhraseBank-v10 |
Maybe, I believe that, yes, but how?. I'm wondering how to do this to adapt the solution to another language in the future. |
@emes83 i try this data . #5 (comment) .maybe you should try. |
The same here. |
I am getting the same error IndexError: list index out of range and the dataset in the link has invalid characters as well. Anyone could fix the issue? |
Can somebody please share train/validation/test files? In proper/working format? |
Hello mates ! I attach the csv files for the Financial Phrase Bank dataset that I made. FinancialPhraseBankforFinBERT.zip Valentin TASSEL |
The labels provided by @valentintsl are not balanced. I'm using the script below to create these datasets: import pandas as pd
with open('Sentences_50Agree.txt', 'rb') as f:
data = f.read().decode(errors='ignore')
df = pd.DataFrame([x.split('@') for x in data.strip().split('\r\n')], columns=['text', 'label'])
pos = df.query('label=="positive"')
pos = pos.sample(len(pos), random_state=0) # shuffle samples
neg = df.query('label=="negative"')
neg = neg.sample(len(neg), random_state=0)
neu = df.query('label=="neutral"')
neu = neu.sample(len(neu), random_state=0)
n_pos = int(len(pos)*0.2)
n_neg = int(len(neg)*0.2)
n_neu = int(len(neu)*0.2)
pd.concat([pos[:-n_pos*2], neg[:-n_neg*2], neu[:-n_neu*2]], axis=0).to_csv('train.csv', sep='\t')
pd.concat([pos[-n_pos*2:-n_pos], neg[-n_neg*2:-n_neg], neu[-n_neu*2:-n_neu]], axis=0).to_csv('validation.csv', sep='\t')
pd.concat([pos[-n_pos:], neg[-n_neg:], neu[-n_neu:]], axis=0).to_csv('test.csv', sep='\t') |
Error is
Can anyone check this and help me, please? |
You can find the instructions to create these files on the updated README. |
Hi, how to setup and create train.csv, validation.csv, test.csv from Financial Pharase Bank data?
The text was updated successfully, but these errors were encountered: