Live at https://analytics.vipulnaik.com/
- Project layout
- Database layout
- Setting up Google Analytics (Universal Analytics aka pre-2023)
- Setting up the portal
- Migrating from Universal Analytics to Google Analytics 4
- Install the new Python client for GA4 Data API
- Enable the new API in your Google Cloud Project
- Switch over each website to use GA4 instead of Universal Analytics
- For each new GA4 property, add the email of your Google Developers/Cloud Project so it can access the data
- Add the property ID and GA4 start date to projects.sql
- Run the new fetching script
- See also
fetch_pageviews_ga4.py
: this script queries Google Analytics (GA 4) using its API, and stores the data in a local database. This script should be run in a cron job. We used to havefetch_pageviews.py
that did the equivalent thing with Universal Analytics (UA) but Google no longer supports UA and has deleted all historical UA data starting July 2024, so we've removed that script in this commit.print_table.py
: this script queries the local database and prints pageviews data in an HTML table.sql/
: table schema and some data for the local database.access-portal/
: if this project is served over the web, this is where users will access the site.
The database should be named analyticstable
.
The database has two tables:
projects
: this stores some metadata about each project. The data for this is stored directly in the SQL file.pageviews
: this stores the pageviews. The data for this is inserted byfetch_pageviews_ga4.py
.
There are several parts to this:
- You must set up a project on the Google developers console. From the project, you will create a service account, which will have an email address associated with it as well as a credentials JSON file.
- For each website tracked via Google Analytics, you must give your special email address access to the analytics data.
- For each website tracked via Google Analytics, you must add the project
metadata to
sql/projects.sql
. - Optional: change the Google Analytics quota.
Each step is covered below.
A project seems to be some sort of umbrella thing for the Google developers console, whereas a service account is a thing that gets an email. You will create a single project and a single service account for that project.
- Go to https://console.developers.google.com
- Create a new project somehow (I can't go back to the screen displayed before I had any projects, so I can't write up the exact steps).
- Enable analytics API for your project: https://www.dundas.com/support/learning/documentation/connect-to-data/how-to/connecting-to-google-analytics
- Go to https://console.developers.google.com/iam-admin/serviceaccounts
- Click "create service account". You can enter arbitrary stuff here; you don't need to give the service account any permissions.
- Create a key for the service account. If you're creating a new service
account, there should be a section called "Create key (optional)". Click
"create key", and make sure JSON is selected. Now create the key and save it
as
key.json
(I think Google's documentation refers to this file asclient_secrets.json
) in this repo's directory. If you already have a service account, in the table that lists your service accounts, there is a column called "Actions". Click the hamburger and select "Create key", and save askey.json
(I think Google's documentation refers to this file asclient_secrets.json
).
For this step, you will need the email address associated with your service account.
From the analytics home page, you can go to admin → then under the View column User Management → and then the (+) in the upper right corner → "Add new users" → and use email address for your service account, and make sure "Read & Analyze" is checked, then click Add.
Admin → under the View column "View Settings" → under "Basic Settings" there should be a "View ID" heading.
- Go to https://console.developers.google.com/iam-admin/serviceaccounts
- In left bar, click "Quotas"
- Under services dropdown, restrict to "Google Analytics Reporting API".
- The one that might cause problems is "Google Analytics Reporting API - Requests per 100 seconds", so click that, and in the next screen you can click the pencil to change it up to 50,000.
# Create database
mysql -e "create database analyticstable"
# Create tables
mysql analyticstable < sql/projects.sql
mysql analyticstable < sql/pageviews.sql
mysql analyticstable < sql/path_pageviews.sql
In this repo's directory, create a file called login.py
with the
database login info for (ideally) a read-only user that is needed for
web access.
For example:
USER = "analytics"
DATABASE = "analyticstable"
PASSWORD = ""
Also make a file called login_rw.py
with the database login info for
a read-write user that can be used to run the pageviews-fetching
scripts.
USER = "analytics_rw"
DATABASE = "analyticstable"
PASSWORD = ""
Run:
sudo pip install --upgrade google-api-python-client mysql-connector-python oauth2client
(source)
If everything is set up correctly, you should now be able to run the data fetching script:
./fetch_pageviews_ga4.py key.json
If it works, try running it again:
./fetch_pageviews_ga4.py key.json
The script checks the most recently stored date for each project, and only queries for more recent data, so the second run should be much quicker. (If a website got zero pageviews on the most recent days, it might try to query a small number of dates.)
NOTE: Data from before the project was switched to GA 4 is only available by restoring from a backup; it cannot be obtained from the Google Analytics API any more. So, you won't be able to reconstruct the data as currently seen on analytics.vipulnaik.com. However, data July 2023 onward should use GA 4 so data from that point onward should be reconstructible.
make fetch_tablesorter
If you're trying to run the website locally, run from the access-portal/
directory:
php -S localhost:8000
If you're trying to serve the website over the web, edit your nginx/apache
config; make sure the root directory is access-portal/
.
Issa decided to use virtualenv following the official instructions, but you can probably get away with using the system pip.
pip install virtualenv
virtualenv venv
source venv/bin/activate
pip install google-analytics-data
Go to this page and click on the blue button that says "Enable the Google Analytics Data API v1". Select the project that you created in "Set up a Google developers project and a service account". If for some reason there is no existing project, you can maybe go to here and click the blue botton that says "Enable the Google Analytics Data API v1" to create a new project. (Note: for me the button did not work in Firefox and I had to use Chrome.)
You may need to download a new key JSON file (when I was testing this, I created a new project and it gave me a key JSON file).
(instructions given separately; TODO maybe add them here)
For each new GA4 property, add the email of your Google Developers/Cloud Project so it can access the data
I'm not sure if this step is necessary. But in GA4 you can go to Admin (leftmost sidebar, way at the bottom) -> Property column -> Property Access Management -> Plus sign (top right corner) -> Add users -> Enter the email address, and make sure the "Viewer" role is selected ("Data restrictions" don't matter I think) -> Add (top right corner).
Go to https://github.com/riceissa/analytics-table/blob/master/sql/projects.sql and edit the file
so that the new columns property_id
and ga4_start_date
have values. To find the
property ID, go to GA4 and navigate to Admin (leftmost sidebar, way at the bottom) ->
Property column -> Property Settings -> Find the "PROPERTY ID" on the right
side of the screen, you can click the copy button to copy the ID.
If everything is set up correctly, you should now be able to run the GA4 data fetching script:
./fetch_pageviews_ga4.py key.json
If it works, try running it again:
./fetch_pageviews_ga4.py key.json
The new script works exactly like the old one. It checks the most recently stored date for each project, and only queries for more recent data, so the second run should be much quicker. (If a website got zero pageviews on the most recent days, it might try to query a small number of dates.)