-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duplicate tweets insertion in the database #4
Comments
related to c7c1c8d |
my bad... it is not related to that... it is actually related to tweet ids being larger than the max possible for mysql integer columns.. and get silently truncated We quickly realized the issue and ALTERed the impacted tables, but the MySQL behavior of silently truncating all values larger than 2147483647 with the only fail-safe being the primary key constraint was worrying. Any columns that lacked a similar constraint would be experiencing silent data corruption. 2^31 − 1 = 2,147,483,647 |
this is related to the database creation flow... i'll fix the migrations at https://github.com/sunlightlabs/politwoops to create the data that matches the schema define in this repo |
Can you post an actual active schema dump from the system? |
Yes, you do need to alter tweets.id to bigint. It's the only schema change required. I meant to fix that in the ruby migrations. A patch would be welcome, else I will fix it soon. |
I got it running, i'll prepare a patch, a basic one to get the thing up for anyone.. but we'll need to add some missing migrations too... Will keep you posted |
sent you a pull request propublica/politwoops_sunlight#4 |
when running the politwoops-worker.py to read the tweets from beanstalkd and insert them into mysql
PYTHONPATH=$PYTHONPATH:
pwd
/lib ./bin/politwoops-worker.pythe first tweet gets inserted, but it seems no deleted from beanstalkd. so shortly the scripts tries to insert it again leading to duplicate errors
The text was updated successfully, but these errors were encountered: