# Supported Tasks

As you migrate your existing data pipelines into Databricks, your data engineering team wonders if they are selecting the correct language for development. While you don't want to refactor everything, you want to ensure the selected language is the best long-term selection for your problem, and that Databricks can support the required language.

**Instructions**

Which of the following tasks would be supported in Databricks?

- Creating a custom function for a business-specific calculation in Python ✅
- Joining two tables together in SQL ✅
- Creating a complex function that iterates over each row in Ruby ❌
- Performing advanced statistical techniques on your data ✅

#  Development on Databricks

You lead a cross-functional team of data professionals at Sierra Publishing, and you want them to start using Databricks in some capacity. Each person has their own way of working, and they would like their new workflow to be similar to what they have now.

**Instructions**

For each of the setups listed, give a recommendation with development UI the person should use. Try to have them work in the Databricks platform directly as much as possible.

- **Databricks Notebooks:** Sally has several Python scripts that she needs to migrate, Carlos wants to develop a model in R.
- **SQL Editor:**  Tom has created several tables using SQL, Tina has several queries to provide answers for the business.
- **Databricks Connect:** Amrinder has several testing frameworks that he has built into his VS Code environment, Julie loves RStudio and wants to keep working with it.

# Run your first notebook

In this exercise, you will create your first notebook and run some Python script.

The data scientist at Sierra Publishing has just received access to the Databricks environment and has reached out to you about some of the work you have been doing. They suggested it would be good to get an idea of how to work in Databricks, specifically using features in the Workspace page. The Workspace is where you can store notebooks, queries, and dashboards you are working on. You have decided that you will both play around with the dataset you uploaded earlier and do some basic exploration.

*We do not recommend doing so, but if you lost progress you will have to recreate the data table bx_books_file using the information in the Adding your datasets exercise, and recreate the cluster using the information in the Create your first cluster exercise.*

**Instructions**

1. Navigate to the Workspace page. In the top-right corner, click Create and then select Notebook. At the top-right of the notebook, select the Connect dropdown menu, and select the first_cluster you created in the Create your first cluster exercise.

2. Type the following code into the first notebook cell and change the parameters for your specific table. Then run it:
    ```
    df = (
        spark
        .read
        .table('<catalog>.<schema>.bx_books_file')
    )

    display(df)
    ```
    You should see the results of your query in the cell.
    *Note: To replace the table name easily, click on the Catalog icon (third option down) on the left-hand side of the notebook. Navigate to the table name, and click on the Insert table name option that is next to the table name.*

3. Below that first cell, you should see a button to create a new Code Cell. Create a new one and type the following code in:
    ```
    from pyspark.sql.functions import col, count

    display(df
            .groupBy(df['Year-Of-Publication'])
            .count()
            )
    ```

4. Based on our analysis, how many books were published in the year 1986? `384`