Normalize Database #82

tas09009 · 2021-09-14T19:11:11Z

Changes to the schema

remove Nicety: starred and User: autosave_timeout and autosave_enabled This will address Issues Remove unused autosave columns #80 and Remove starred column #34
Connected User to Profile using User.profile_id
Nicety: target_id now points to Profile.id. Added back name along with other attributes that were normally being called thorough the API.
Stint: Added in profile_id for the foreign key connection
Profile: removed faculty since that's determined by stint['type']

To use new database:

createdb <database_name>
change the DATABASE_URL environment variable in your .env file to update the database, then run source .env on your terminal
flask db upgrade
Run python update-data.pyon the project level directory. The script logs each Profile being added so feel free to comment out lines 78 and 83.

Here is a link to edit the schema directly

jasonaowen

Awesome work, @tas09009! Initial comments on the database structure below. You should now have access to the prod data, so please download a copy and try this migration locally.

migrations/versions/12780a039923_populate_database.py

jasonaowen · 2021-09-15T18:22:43Z

migrations/versions/12780a039923_populate_database.py

+    )
+    op.add_column('nicety', sa.Column('stint', sa.Integer(), nullable=True))
+    op.drop_constraint('nicety_author_id_target_id_end_date_key', 'nicety', type_='unique')
+    op.create_unique_constraint(None, 'nicety', ['author_id', 'target_id'])


This constraint needs to include the stint, so that one person can write two niceties for someone who has attended RC twice.

jasonaowen · 2021-09-15T18:23:57Z

migrations/versions/12780a039923_populate_database.py

+depends_on = None
+
+
+def upgrade():


We need to migrate existing data here, including finding existing Users - who may no longer be part of RC, and thus not be in the data returned by the RC API - and populating the new profile table with that data.

migrations/versions/12780a039923_populate_database.py

jasonaowen · 2021-09-15T18:25:27Z

migrations/versions/12780a039923_populate_database.py

+    sa.Column('profile_id', sa.Integer(), nullable=True),
+    sa.Column('type_stint', sa.String(), nullable=True),
+    sa.Column('start_date', sa.Date(), nullable=True),
+    sa.Column('end_date', sa.Date(), nullable=True),


I think all of these are required, and should not be nullable.

I made these nullable, thanks for catching this. The only exception I found was end_date since faculty don't have any end date.

jasonaowen · 2021-09-15T18:29:55Z

update-data.py

+            db.session.commit()
+
+        else:
+            logging.info(f"Skipping: {id}")


I think we do need to update existing profiles! Particularly with the initial User migration, but also as people do things like change their interests or upload new photos.

Made changes to this so it adds/updates every profile, thank you!

jasonaowen · 2021-09-15T18:34:24Z

migrations/versions/12780a039923_populate_database.py

+    # ### end Alembic commands ###
+
+
+def downgrade():


I think there's going to end up being quite a lot more to this downgrade than is here, in order to handle existing data. I also don't think that's a worthwhile use of time - we'll take a database backup before merging this PR, and if the deployment goes wrong, we can roll back and manually restore. We can just drop the downgrade entirely!

tas09009 · 2021-09-28T00:45:40Z

Originally we talked about creating one migration script which includes the schema updates as well as the new models to populate from the RC API. Here is actually the approach I took (so far):

In order to normalize the database with profile and stint, I needed to

create and migrate those models
run the database populating script
create another migration to move over missing people from user to profile.

I couldn't combine steps 1 and 3 into one migration script since the database has to be populated with profiles after the profile and stint tables are created but before the missing profiles from user can be added in. Here are the steps to run them on the production database.

flask db downgrade run this command twice to downgrade to the latest migration script used in production
flask db upgrade b99a55eb8e08 to add Profile and Stint tables
python update-data.pyto add the 2070 profiles
flask db upgrade 27d86222146d to migrate missing users (not pulled from RC API) over to Profile. Total profiles should now be 2074.

If we don't want two migrations, then I imagine 2 ways to create a single migration:

creating profile and stint tables and moving all users over to profile so that profile now has 451 users. Then running the update-data.py to pull all RC API profiles and overwriting the 451 ids that already exist in profile but not overwriting the missing ones.
Merge the two migrations above into one and call the update-data.py script in between the two migration steps.
When I do update the schema for user and nicety, I can combine that migration script with the migrations/versions/27d86222146d_migrate_missing_users_into_profile.py so we will still have a total of 2 migration scripts.

I also ran into a separate issue: apparently there several niceties written for target_id=1932 but id=1932 doesn't exist in user or profile. Which means I'm unable to change the schema to connect the nicety.target_id to profile.id (author_ids are fine btw). I ran a quick query to find the other missing ids:

SELECT 
  DISTINCT target_id 
FROM 
  nicety 
WHERE 
  target_id NOT IN (
    SELECT 
      id 
    FROM 
      profile
  );

which returned the following 4 ids: 1932, 3292, 2218, 2481
Not sure why this is? Until then, I can't update the schema for nicety. I just messaged James from the RC staff to ask about this - just talked to him last week about the stint ids.

I still made changes to the user and nicety models without adding them to any migration. If this is confusing, I can change them back to their original state until they are ready to be migrated.

Let me know if this is unclear or the wrong approach to take. Thanks!

tas09009 · 2021-09-28T17:13:02Z

Just asked James about those missing ids and he said they are no longer part of the RC community. I can speak more to this at our Wednesday meeting.

jasonaowen · 2021-10-27T20:29:41Z

We will need an access token for the nightly script. @mjec, since the application is using your client credentials, I think it makes the most sense to also have you generate an access token - does that sound right to you? If so, could you please add an environment variable to Heroku named RC_API_ACCESS_TOKEN?

I think the command line would be something like:

heroku config:set RC_API_ACCESS_TOKEN=my_token

mjec · 2021-10-27T20:38:10Z

RC_API_ACCESS_TOKEN has been created in Heroku and should now be available for use @jasonaowen

jasonaowen · 2021-10-27T20:40:33Z

So quick - thank you so much, @mjec!

tas09009 · 2022-01-17T20:51:55Z

I believe this captures everything we've talked about thus far. Run the following lines on the terminal to upgrade and downgrade the migration:

flask db upgrade
flask db downgrade

jasonaowen · 2022-01-21T00:38:13Z

@tas09009 and I paired on restructuring these changes. So far we've pulled two commits out, 448a9c3 and e0b4921.

e0b4921 is worth calling out, I think: I reviewed this data-only migration by examining the database by hand.

Nicety ID 221 is one of the lost niceties described in #10: the target user changed their end date, and the author created a new nicety (ID 791) that is very similar but not identical, and with the new end date.

Relevant queries:

SELECT id, author_id, target_id, date_updated, CONVERT_FROM(DECODE(text, 'base64'), 'utf8')
FROM nicety WHERE id IN (221, 791);
SELECT * FROM nicety WHERE author_id = ? AND target_id = ?;

Nicety ID 3634 is more complicated. It has a correct end date, but it is the replacement for a previously-written nicety (ID 2956) that was lost due to the target recurser changing their end date. The text of the old nicety has much more detail than the new one, but it also has an incorrect end date. I believe the intent is to clean that up in the subsequent migration that assigns stint_ids to all niceties that have an end date that matches no known stints. Also, @tas09009, is this one of the niceties people you emailed the author about?

tas09009 · 2022-01-23T14:35:21Z

Hi @jasonaowen, I didn't send an email for this one because of what you mentioned when comparing the two niceties. But I certainly can if you'd like!

jasonaowen · 2022-03-16T20:17:26Z

migrations/versions/689a6ce963c3_remove_anonymous_columns.py

+    code maintenance.
+    """
+    op.drop_column("user", "anonymous_by_default")
+    op.drop_column("nicety", "anonymous")


I think we do still want to allow anonymous niceties! Just not saving the default preference.

jasonaowen · 2022-03-16T20:30:12Z

migrations/versions/b49314aac7b0_create_stints_for_missing_users.py

+    connection = op.get_bind()
+
+    """
+    The 156 nicetys with null dates can be grouped into 18 profiles


I think this is 156 niceties with null stint_id, rather than null dates, right?

jasonaowen · 2022-04-06T20:18:42Z

So last we talked, the migrations were in pretty good shape, and the remaining work on this PR is to update the API to use the new database structure. Thanks again for working on this, @tas09009 - I know it's a big change!

Remove nicety.starred, user.autosave_enabled and user.autosave_timeout. These columns are not used and haven't been for some time. Issue #34: Remove starred column Issue #80: Remove unused autosaved columns

Giving niceties anonymously in bulk is a partially implemented feature for the user. Ultimately want to discourage this feature as well as make it easier for code maintenance. Issue #79: Remove unused anonymous_by_default column

These two niceties were manually reviewed and had substantially similar but not identical content to two other niceties that are from the same author_id to the same target_id. These niceties were "lost" because the target user extended their batch and had a different end_date. Issue #82: Normalize Recurser Profiles Issue #10: Niceties lost when Recursers extend their batch

"update_data.py" fetches information from the RC API to populate the database and have an up-to-date cache. We can schedule data updates using the Heroku Scheduler. Update the "README.md" to include instructions on creating an access token for the RC API requests and set up a Heroku Scheduler. Issue #68: Normalize Recurser profiles Issue #5: Clearing the cache in a timely fashion Co-authored-by: Jason Owen <jason@jasonaowen.net>

We want to maintain a local copy of the RC API data and make our data model more consistent by including all the relevant information for a nicety. Previously, we were relied on making API calls to populate data we didn't have locally in response to each request to the front end. We are still doing this, but these data models set the stage to stop doing that. Issue #68: Normalize Recurser profiles Co-authored-by: Jason Owen <jason@jasonaowen.net>

Create new tables and models for Profiles and Stints, and populate them with the data from the RC API. Import functions from update_data.py. Following this migration, use the update_data.py module to pull all data from the RC API. Co-authored-by: Jason Owen <jason@jasonaowen.net>

Create Profiles for: - 5 people: ids missing from the Profile table but found in the User table. - 4 people: ids missing from both User and Profile tables. These 4 people have niceties written for them but no profile to connect to. Save their names as "Former Recurser". Co-authored-by: Jason Owen <jason@jasonaowen.net>

The nicety.stint_id column will remain nullable until all profiles are populated with their respective stints in future migrations. Create a foreign key constraint to tie niceties to stints. Change the unique constraint to be between the "author_id," "target_id," and "stint_id." We are still leaving "end_date" alone because we are not yet cleaning up the backend to use "stints" instead of "end_dates." Create a foreign key to tie target_ids to profile_ids. Co-authored-by: Jason Owen <jason@jasonaowen.net>

Populate all nicety.stint_ids based on end dates since it's a required column. This change will exclude 156 niceties due to mismatching end dates. The subsequent migration will address these 156 niceties with null stint_ids.

The 156 niceties with null stint_ids consist of 18 profiles with 18 end dates. Out of the 18 profiles, create stints for the newly created profiles using negative stint ids: - 9 new profiles manually created (4 Former Recurser's + 5 from Users table) - 2 profiles with no niceties written for them, therefore no need to generate stints This leaves 7 profiles with 8 stints, with 2 stints for one profile After this, there will be 68 niceties with null stint_ids left. The subsequent migration will address this.

Populate stint_ids for 68 niceties where nicety.stint_id = null. All niceties are for the target_id's latest stint.

Nicety.stint_id was allowed to be null while populating their values. Now that the 156 nicety.stint_ids are populated, this column won't be null again.

These are the changes to the actual backend files to finally pull data from the database, rather than user RC's API. This is a draft; the files will need to be organized and old code will need to be deleted. This push is simply to have the changes on Github.

Update the external link to set up an RC application and restrict the text width to be less than 85 columns wide. Co-authored-by: Jason Owen <jason@jasonaowen.net>

Remove unused comments, add a downgrade, and change the formatting to match new migrations.

tas09009 · 2022-06-25T23:51:27Z

Hi @jasonaowen,

While working on the API changes, I realized they could only happen after the database migration since some of the changes we're implementing - switching from batches to stints and removing caching - have to wait until that last migration. You can see some of my WIP in the 3rd to last commit.

Do you think we should split this PR up again into two parts? It'll be easier to digest too:

Database + schema migration
API changes to incorporate migrations

It will also make it easier to push changes to the two PRs as unanticipated code changes come up, such as when we discover that a user's stint changes their date and their stint_id.

jasonaowen suggested changes Sep 15, 2021

View reviewed changes

jasonaowen reviewed Sep 15, 2021

View reviewed changes

tas09009 force-pushed the populate-database branch from 9da6ed5 to 9d05992 Compare January 21, 2022 00:20

tas09009 force-pushed the populate-database branch from 9d05992 to 83d7635 Compare January 30, 2022 21:07

tas09009 force-pushed the populate-database branch from 83d7635 to fa95414 Compare March 9, 2022 22:25

jasonaowen reviewed Mar 16, 2022

View reviewed changes

tas09009 changed the title ~~create script + update models~~ Normalize Database Mar 26, 2022

tas09009 force-pushed the populate-database branch from fa95414 to bfd6054 Compare March 30, 2022 20:02

tas09009 added 2 commits April 17, 2022 13:19

Remove unused attributes from Nicety and User

ac1c5a9

Remove nicety.starred, user.autosave_enabled and user.autosave_timeout. These columns are not used and haven't been for some time. Issue #34: Remove starred column Issue #80: Remove unused autosaved columns

Remove user.anonymous_by_default column

2d48ffa

Giving niceties anonymously in bulk is a partially implemented feature for the user. Ultimately want to discourage this feature as well as make it easier for code maintenance. Issue #79: Remove unused anonymous_by_default column

tas09009 mentioned this pull request Apr 22, 2022

Delete two duplicate niceties #95

Open

tas09009 and others added 7 commits April 23, 2022 15:18

Populate nicety.stint_ids using nicety.end_dates

19f1f09

Populate all nicety.stint_ids based on end dates since it's a required column. This change will exclude 156 niceties due to mismatching end dates. The subsequent migration will address these 156 niceties with null stint_ids.

tas09009 and others added 6 commits April 24, 2022 14:36

Populate nicety.stints ids with the latest stint

3633204

Populate stint_ids for 68 niceties where nicety.stint_id = null. All niceties are for the target_id's latest stint.

Change nicety.stint_id to never be nullable

a17110f

Nicety.stint_id was allowed to be null while populating their values. Now that the 156 nicety.stint_ids are populated, this column won't be null again.

API Changes (WIP)

5424b41

These are the changes to the actual backend files to finally pull data from the database, rather than user RC's API. This is a draft; the files will need to be organized and old code will need to be deleted. This push is simply to have the changes on Github.

Update README.md RC application link

7029843

Update the external link to set up an RC application and restrict the text width to be less than 85 columns wide. Co-authored-by: Jason Owen <jason@jasonaowen.net>

Update previous migrations to match current ones

121a6ea

Remove unused comments, add a downgrade, and change the formatting to match new migrations.

tas09009 force-pushed the populate-database branch from bfd6054 to 121a6ea Compare June 25, 2022 23:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize Database #82

Normalize Database #82

tas09009 commented Sep 14, 2021

jasonaowen left a comment

jasonaowen Sep 15, 2021

jasonaowen Sep 15, 2021

jasonaowen Sep 15, 2021

tas09009 Sep 27, 2021

jasonaowen Sep 15, 2021

tas09009 Sep 27, 2021

jasonaowen Sep 15, 2021

tas09009 commented Sep 28, 2021 •

edited

Loading

tas09009 commented Sep 28, 2021

jasonaowen commented Oct 27, 2021

mjec commented Oct 27, 2021

jasonaowen commented Oct 27, 2021

tas09009 commented Jan 17, 2022

jasonaowen commented Jan 21, 2022

tas09009 commented Jan 23, 2022

jasonaowen Mar 16, 2022

jasonaowen Mar 16, 2022

jasonaowen commented Apr 6, 2022

tas09009 commented Jun 25, 2022

		depends_on = None


		def upgrade():

Normalize Database #82

Are you sure you want to change the base?

Normalize Database #82

Conversation

tas09009 commented Sep 14, 2021

Changes to the schema

To use new database:

jasonaowen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tas09009 commented Sep 28, 2021 • edited Loading

tas09009 commented Sep 28, 2021

jasonaowen commented Oct 27, 2021

mjec commented Oct 27, 2021

jasonaowen commented Oct 27, 2021

tas09009 commented Jan 17, 2022

jasonaowen commented Jan 21, 2022

tas09009 commented Jan 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasonaowen commented Apr 6, 2022

tas09009 commented Jun 25, 2022

tas09009 commented Sep 28, 2021 •

edited

Loading