Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not working with Bohek 3.0.0 #34

Closed
pnadelofficial opened this issue Nov 2, 2022 · 7 comments · Fixed by #35
Closed

Not working with Bohek 3.0.0 #34

pnadelofficial opened this issue Nov 2, 2022 · 7 comments · Fixed by #35

Comments

@pnadelofficial
Copy link

Hello,

I was using bulk today for twitter data. It worked great and I was blown away by the results, but I needed to uninstall Bohek 3.0.0 and downgrade to Bohek 2.4.3. Once I did this, everything worked well. I'm writing to ask if you plan on updating the package to Bokeh 3.0.0 and, if you're not, if you could make a note about this version issue on in the GH repo.

Apologies if this is the wrong place to bring this up and thanks so much for this package!

Thanks!

@koaning
Copy link
Owner

koaning commented Nov 3, 2022

Ah, good call. Yeah that came out a few days ago and it seems that it broke something.

Will have a look, thanks for reporting!

@koaning
Copy link
Owner

koaning commented Nov 3, 2022

I've set up a max version for now. That way, it should not break while I figure out what to do about v3 of Bokeh.

Can you check if v0.1.2 works out of the box now?

@koaning koaning reopened this Nov 3, 2022
@pnadelofficial
Copy link
Author

Yes, version 0.1.2 works out of the box now. Thanks for the updates!

@koaning
Copy link
Owner

koaning commented Nov 3, 2022

Grand. Thanks for reporting!

@koaning koaning closed this as completed Nov 3, 2022
@koaning
Copy link
Owner

koaning commented Nov 3, 2022

Also, I'm a bit curious. What embeddings are you using for Twitter? Stuff from embetter?

@pnadelofficial
Copy link
Author

Yes, I'm following the example you left in the readme for bulk, though I swapped from the 'all-MiniLM-L6-v2' model to the 'distiluse-base-multilingual-cased-v2' model because the tweets are mostly in Persian with some English and I've had success with that model in the past for multilingual data. There are about 190,000 tweets in total all about the protests and unhappiness with the government in Iran. I've gotten some really interesting results already, even though I only started with the first 10,000 tweets, so I appreciate the work you've put into bulk.

@koaning
Copy link
Owner

koaning commented Nov 4, 2022

Happy to hear it.

At some point, I might also recommend taking a step back from bulk and using a trained ML model and a proper labeling tool to proceed. Bulk is meant to help you get started, but the labels aren't 100% accurate, and I'd assume that a fine-tuned model might be of more help later. Just a gut feeling, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants