Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LB-500: Fix Timescale DB setup in manage.py #761

Merged
merged 1 commit into from Mar 20, 2020

Conversation

shivam-kapila
Copy link
Collaborator

@shivam-kapila shivam-kapila commented Mar 20, 2020

Description

The command ./develop.sh manage init_ts_db -f raises psycopg2 authentication error (JIRA ticket).

Problem

The command ./develop.sh manage init_ts_db -f throws an error
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: password authentication failed for user "timescale_lb"

This might be due to the reason that on running the above command it runs the queries in admin.timescale.drop_db.sql which also drops the timescale_lb user. Although the timescale_lb user gets created during execution of the following code and queries but no password is associated with it.

Also, the listenbrainz_web docker container crashes as the webserver tries to create connection to Influx.

Solution

  • Run the queries in admin.timescale.create_db.sql to create the timescale_lb user and associate password with it

  • Remove the function call create_influx(app) in listenbrainz.webserver.init.py

@shivam-kapila shivam-kapila changed the title LB-500: Fix Timescale DB setup in manage.py [wip] LB-500: Fix DB setup scripts in manage.py Mar 20, 2020
Copy link
Member

@mayhem mayhem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, that was a lot simpler than expected. Great!

@mayhem mayhem merged commit dcb7da9 into metabrainz:timescale Mar 20, 2020
@shivam-kapila shivam-kapila changed the title [wip] LB-500: Fix DB setup scripts in manage.py LB-500: Fix Timescale DB setup in manage.py Mar 20, 2020
mayhem added a commit that referenced this pull request Jul 22, 2020
* Interim check-in

* First cut of a very hacky, lots of harded shit in place version of the timescale writer

* Add some exception handling, but the write is stil not starting up

* Update the python script invocation with -u for unbuffered output

* Push timescale writer update

* more testing

* Start changing the config from influx to timescale

* Add tables for creating the timescale db

* Interim check-in trying to work out how to connect to the ts pg

* Interim check-in due to coronavirus

* Setup timescale init features

* Fixing config.py.sample

* Remove port spec

* More connect string fixes

* Fix -f in Timescale Setup in manage.py

* Fix -f in Timescale Setup in manage.py (#761)

* Interim-check in

* LB-503: Add command to connect to Timescale DB via develop.sh (#767)

* Fix -f in Timescale Setup in manage.py

* Add command to connect to Timescale via develop.sh

* Use URL to connect to PSQL

* Sanity check check-in

* Reverting to what I had before

* Timescale tests setup

* Removed merge conflicts

* Fix travis.yml

* Continue converting things to timescale and nuking influx code

* Adding missing init call

* Another interim check-in

* Still fixing tests setup

* Port to_influx to to_timescale and fix a few syntax errors

* Fixing listenstore, fixing tests, fixing setup. A test nearly passes!

* One basic test passes!

* Fix get recent listens function

* Timescale tests patch 1

* Another fix

* Add test for listen_count view

* Add wait for timescale

* Ported the remainder of the timescale listenstore, but fully untested

* Tests for dump listens

* Fix fetch recent listens

* Remove a print

* Some minor changes

* More tests

* Add the inserted_before check back

* fix travis problem where bash -c was removed and extra \

* Use the correct type for the check

* Remove another backslash

* Convert the variable being passed to integer

* Add a sleep to make sure the listens got inserted before we create the
dump

* Add the inserted_before check back

* fix travis problem where bash -c was removed and extra \

* Use the correct type for the check

* Remove another backslash

* Convert the variable being passed to integer

* Add a sleep to make sure the listens got inserted before we create the
dump

* Use correct image in integration tests

* Timescale Listenstore tests done

* Replace influx_connection with timescale_connection

* Fix lastfm user tests

* Continue removing influx and fixing tests

* Fixing User tests

* Remove unnecessary merge logs

* Fixed test_user tests. Finally

* Fix test_index tests

* Fixed up the general integration tests, though a lot of tests are still failing, all structural problems are solved.

* Fix test_from_timescale test

* All unit tests fixed

* Remove more influx references

* Removed a bunch of useless prints and then got stock on the timestamp mess AGAIN!

* Fix insert json listens into timescale

* Fix integration tests (#809)

* Fix lastfm user tests

* Fixing User tests

* Remove unnecessary merge logs

* Fixed test_user tests. Finally

* Fix test_index tests

* Fix test_from_timescale test

* All unit tests fixed

* Fix insert json listens into timescale

* Cast to int at the right place

* ALL TESTS PASS!

* Remove prints

* A couple of minor fixes

* Fix timescale tests

* Do not make use of rmq connections until after the app is running.
If running in debug mode, close the connection, don't pool it.
This cuts down on error messages in the log.

* Modify tests to fetch exact listen_count in test environment

* Adding needed startup files

* Fix up DB initalization, add necessary files for a test server and some debugging the pipeline.
Also prefix predis keys with env so that different servers don't conflict with one another.

* Change timescale_lb to listenbrainz_ts

* Add namespace keys to the remaining cache lookups

* Performance improvements for fetching min/max timestamps. Two more continuous aggregates are helping with this...

* If no timestamps are given start off by 1 second.

* Try harder somewhat implemented. One unit test is failing and more tests need to be added for try_harder

* Fix redis tests

* Complete try harder logic

* Fix unit tests

* Fix integration tests

* Hide pager when try harder is not 0

* Update frontend tests

* Add test for try_harder

* pylint and pep8 fixes

* Hopefully fix export listens.

* Fix fetch all query

* Uncomment the PG section

* Upgrade to pg12, fix one test to have deterministic results

* Fix pep-8 and eslint issues. One eslint issue remains.

* Changed internal structure to use listened_at, track_name, user_name is unique key.

* Use the last timestamp if we have one for the user, rather than defaulting to now()

* Change recording_msid to track_name for unique listens

* Fix the failing test

* Remove GRANT statements

* Add a docstring or fetch single timestamp

* Fix query indentation.

* Adapt integration tests

* Remove test specific behaviour

* Fix GRANT removal consequences

* Imrpove time_range support and add test

* Change try_harder to search_larger_time_range

* Remove log statement

* Make spark dumps the main dumps and stop dumping per user.

* Get rid of the old style dumps and make spark dumps the main
dumping method. Rewrite full dump code to query by month. Test have not been updated.

* Further progress towards simplified dumps. Some tests still need fixing.

* Update tests to reflect new dump system

* Fix more tests and bugs in dumping code

* Remove all sleep statements from timescale tests. They *shouldn't* be needed.

* Make the created field NOT NULL, since the import script will clean up
timestamps properly. Remove some useless less and improve others to be simpler.

* Fix dump_manager tests. What a fucking ordeal so much wasted time!

* Improve tests by mocking the min/max timestamp collection

* Adding rough draft of transmogrify

* Write to a per month dir struct

* Move the dump transmogrifyer to manage.py

* Remove the old transmogrify script

* Use the prod postgres instance

* Create a new endpoint for fetching listens

* Fix tests related to fetch listen count API

* Fix one last failing test

* Automatically generate spark dumps as it should be happening.

* Make the mogrifier work and act exactly like the current spark dumps

* If a list is already passed into to make list, just return it.

* Fix spark dump filename and make converting to spark rows more resilient

* Fix timestamp in spark dump and also create spark dumps for incremental

* Fix number of expected dumps in dump manager tests

* Adjust expected dump counts.

* Minor pep-8 bs

* Fix snaphshot test

Co-authored-by: shivam-kapila <shivamkapila4@gmail.com>
Co-authored-by: Param Singh <iliekcomputers@gmail.com>
Co-authored-by: Ishaan Shah <ishaan.n.shah@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants