## International Call Percentage  
**Verizon SQL Interview Question**  

### Question:
A phone call is considered an international call when the person calling is in a different country than the person receiving the call.

What percentage of phone calls are international? Round the result to 1 decimal.

### Assumptions:
- The `caller_id` in the `phone_info` table refers to both the caller and receiver.

### Tables:

#### `phone_calls` Table:
| Column Name   | Type     |
|---------------|----------|
| caller_id     | integer  |
| receiver_id   | integer  |
| call_time     | timestamp|

##### `phone_calls` Example Input:
| caller_id | receiver_id | call_time            |
|-----------|-------------|----------------------|
| 1         | 2           | 2022-07-04 10:13:49  |
| 1         | 5           | 2022-08-21 23:54:56  |
| 5         | 1           | 2022-05-13 17:24:06  |
| 5         | 6           | 2022-03-18 12:11:49  |

#### `phone_info` Table:
| Column Name   | Type     |
|---------------|----------|
| caller_id     | integer  |
| country_id    | integer  |
| network       | integer  |
| phone_number  | string   |

##### `phone_info` Example Input:
| caller_id | country_id | network  | phone_number        |
|-----------|------------|----------|---------------------|
| 1         | US         | Verizon  | +1-212-897-1964     |
| 2         | US         | Verizon  | +1-703-346-9529     |
| 3         | US         | Verizon  | +1-650-828-4774     |
| 4         | US         | Verizon  | +1-415-224-6663     |
| 5         | IN         | Vodafone | +91 7503-907302     |
| 6         | IN         | Vodafone | +91 2287-664895     |

### Example Output:
| international_calls_pct |
|-------------------------|
| 50.0                    |

### Explanation:
There is a total of 4 calls with 2 of them being international calls (from `caller_id` 1 → `receiver_id` 5, and `caller_id` 5 → `receiver_id` 1). Thus, the percentage is calculated as `2/4 = 50.0%`.


In [35]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.functions import *

# Initialize Spark session
spark = SparkSession.builder.master('local[1]').appName("International Call Percentage").getOrCreate()

# Creating the phone_calls DataFrame
phone_calls_data = [
    (1, 2, "2022-07-04 10:13:49"),
    (1, 5, "2022-08-21 23:54:56"),
    (5, 1, "2022-05-13 17:24:06"),
    (5, 6, "2022-03-18 12:11:49")
]

phone_calls_columns = ["caller_id", "receiver_id", "call_time"]

phone_calls_df = spark.createDataFrame(phone_calls_data, phone_calls_columns)

# Creating the phone_info DataFrame
phone_info_data = [
    (1, "US", "Verizon", "+1-212-897-1964"),
    (2, "US", "Verizon", "+1-703-346-9529"),
    (3, "US", "Verizon", "+1-650-828-4774"),
    (4, "US", "Verizon", "+1-415-224-6663"),
    (5, "IN", "Vodafone", "+91 7503-907302"),
    (6, "IN", "Vodafone", "+91 2287-664895")
]

phone_info_columns = ["caller_id", "country_id", "network", "phone_number"]

phone_info_df = spark.createDataFrame(phone_info_data, phone_info_columns)

# Show DataFrames to verify
phone_calls_df.show(truncate=False)
phone_info_df.show(truncate=False)


+---------+-----------+-------------------+
|caller_id|receiver_id|call_time          |
+---------+-----------+-------------------+
|1        |2          |2022-07-04 10:13:49|
|1        |5          |2022-08-21 23:54:56|
|5        |1          |2022-05-13 17:24:06|
|5        |6          |2022-03-18 12:11:49|
+---------+-----------+-------------------+

+---------+----------+--------+---------------+
|caller_id|country_id|network |phone_number   |
+---------+----------+--------+---------------+
|1        |US        |Verizon |+1-212-897-1964|
|2        |US        |Verizon |+1-703-346-9529|
|3        |US        |Verizon |+1-650-828-4774|
|4        |US        |Verizon |+1-415-224-6663|
|5        |IN        |Vodafone|+91 7503-907302|
|6        |IN        |Vodafone|+91 2287-664895|
+---------+----------+--------+---------------+



In [47]:
phone_calls_df.alias('p')\
    .join(phone_info_df.alias('c'), col('p.caller_id')==col('c.caller_id'),'left')\
    .join(phone_info_df.alias('r'), col('p.receiver_id')==col('r.caller_id'),'left')\
    .withColumn("cc",when(col('c.country_id')==col('r.country_id'),0).otherwise(1))\
    .select(
        (100.0*sum(col('cc'))/lit(phone_calls_df.count())).alias('international_calls_pct')
            ).show()

+-----------------------+
|international_calls_pct|
+-----------------------+
|                   50.0|
+-----------------------+



In [2]:
phone_calls_df.createOrReplaceTempView('phone_calls')
phone_info_df.createOrReplaceTempView('phone_info')

In [4]:
%%sparksql
SELECT 
  round(100.0*sum(case WHEN c.country_id=r.country_id then 0 
                  else 1 end)/sum(1),1) 
  as international_calls_pct
  
FROM phone_calls p 
  LEFT join phone_info c on p.caller_id=c.caller_id
  left join phone_info r on p.receiver_id=r.caller_id;

+-----------------------+
|international_calls_pct|
+-----------------------+
|                   50.0|
+-----------------------+

