![Hive](./images/Hive.png)

```SQL

CREATE TABLE sales_data_partitioned (
  product_id INT,
  quantity INT,
  revenue DECIMAL(10, 2)
)
PARTITIONED BY (date STRING);

CREATE TABLE sales_data_bucketed (
  product_id INT,
  quantity INT,
  revenue DECIMAL(10, 2)
)
CLUSTERED BY (product_id) INTO 5 BUCKETS;


## ORC

- Optimized Row Columnar
- Data stored by columns rather than rows
- Compression: Zlib, snappy
- Indexing
- Encoding

### Related
- Parquet:
    - does not support clustering
    - partitioning optimizes joins
    - Fast load/read
    - Supports nested data!
- AVRO
    - not columnar, but supports compression

## Optimization Params

| Parameter                          | Description                                          |
| ---------------------------------- | ---------------------------------------------------- |
| `hive.optimize.skewjoin`           | Enables skewed join optimization.                    |
| `hive.auto.convert.join`           | Enables MapJoin (in-memory) where applicable.        |
| `mapreduce.job.reduces`                         | Manually sets the number of reducers.   |
| `mapreduce.map.memory.mb`                       | Memory per map task (MB).               |
| `mapreduce.reduce.memory.mb`                    | Memory per reduce task (MB).            |


```python

spark = SparkSession \
    .builder \
    .appName("hive-access") \
    .enableHiveSupport() \
    .getOrCreate()

df.write.saveAsTable("employee")
