Deployment Guide

Step 1. Download Files

To start, clone or download this repository and navigate to the project's root directory.

We are using the open source MIND: Microsoft News Dataset. After agreeing to the terms, links are available to download and unzip the following files. Please note you only need the large test data along with the two small datasets:

Training Set (MINDsmall_train.zip)
Validation Set (MINDsmall_dev.zip)
Test Set (MINDlarge_test.zip)

Visit https://msnews.github.io/ to download the files above.

Step 2. Setup Resources

Start by deploying Azure Synapse and its related resources:

This button links to the Azure custom deployment page where you can use the azuredeploy.json as your Azure Resource Manager (ARM) template.
If you prefer to setup manually, you need to deploy Azure Synapse Analytics with a Spark pool setup in the workspace and access to Azure Data Lake (Gen2) Storage Account.

Step 3. Upload Sample Dataset

In this step you will upload the MIND: Microsoft News Dataset datasets to the Azure Data Lake (Gen2) Storage.

File upload is available by downloading the Azure Storage Explorer application or using azcopy.

Open the Microsoft Azure Storage Explorer application
Connect to your Azure account
In the Explorer, expand your subscription and find the storage account deployed in Step 1
Expand "Blob containers" and click on the cms container
Create a new folder named MicrosoftNewsDataset and double-click into it
Drag & drop or click Upload > Upload Folder... for the following unzipped MIND folders:
- MINDsmall_train/ (Training Set)
- MINDsmall_dev/ (Validation Set)
- MINDlarge_test/ (Test Set)

Step 4. Security Access

Step 4.1 Add your IP address to Synapse firewall

Before you can upload assests to the Synapse Workspace you will need to add your IP address:

Go to the Synapse resouce you created in the previous step.
Navigate to Firewalls under Security on the left hand side of the page.
At the top of the screen click + Add client IP
Your IP address should now be visable in the IP list (optionally, assign other users' IPs)

Step 4.2: Update storage account permisions

In order to perform the necessary actions in Synapse workspace, you will need to grant more access.

Go to the Azure Data Lake Storage Account created above
Go to the Access Control (IAM) > + Add > Add role assignment
Now click the Role dropdown and select Storage Blob Data Contributor
- Search for your Synapse workspace name (ie recommend-synapse-workspace)
- Als add your username and any other usernames to the search bar
Click Save at the bottom
Repeat steps 2-4 to add the Contributor role to the Synapse workspace as well

Synapse Workspace as Contributor & SBDC for storage account

To enable other users to use this storage account after you create your workspace, perform these tasks:

Assign other users to the Contributor role on workspace
Assign other users the appropriate Synapse RBAC roles using Synapse Studio
Assign yourself and other users to the Storage Blob Data Contributor role on the storage account

Learn more

Step 5. Upload Noteboks

Launch the Synapse workspace (via Azure portal > Synapse workspace > Workspace web URL)
Go to Develop, click the +, and click Import to select all Spark notebooks from the repository's /src/ folder
For each of the notebooks, select Attach to > spark1 in the top dropdown
Update account_name variable to your ADLS in the 01-Load-Data.ipynb notebook
Publish your new notebooks so they are saved in your workspace
Run the following notebooks in order:

Step 6. Explore Insights

Visualize the personalized recommendations using a Power BI dashboard:

Download Power BI Desktop
Open the reports/ContentRecommendations.pbit file
Cancel the Refresh pop-up since the data source needs to be updated
Click Transform data > Data source settings > Change Source... from the top menu
Update the Server field with your Serverless SQL endpoint which can be found within Azure > Synapse workspace > Overview.
Keep database as default and click OK

Congratulations

You have completed this solution accelerator and should now have a report to explore the personalized recommendations:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPLOY.md

DEPLOY.md

Deployment Guide

Step 1. Download Files

Step 2. Setup Resources

Step 3. Upload Sample Dataset

Step 4. Security Access

Step 4.1 Add your IP address to Synapse firewall

Step 4.2: Update storage account permisions

Synapse Workspace as Contributor & SBDC for storage account

Step 5. Upload Noteboks

Step 6. Explore Insights

Congratulations

Files

DEPLOY.md

Latest commit

History

DEPLOY.md

File metadata and controls

Deployment Guide

Step 1. Download Files

Step 2. Setup Resources

Step 3. Upload Sample Dataset

Step 4. Security Access

Step 4.1 Add your IP address to Synapse firewall

Step 4.2: Update storage account permisions

Synapse Workspace as Contributor & SBDC for storage account

Step 5. Upload Noteboks

Step 6. Explore Insights

Congratulations