#### How to determine whether your unix_time_ms values are in seconds or milliseconds?

**1) Check the Length or Magnitude of the Number**

| Value Type       | Typical Range                  | Example         |
| ---------------- | ------------------------------ | --------------- |
| **Seconds**      | 10-digit numbers (≈ billions)  | `1633072800`    |
| **Milliseconds** | 13-digit numbers (≈ trillions) | `1633072800000` |

✅ Rule of Thumb:

- If the value is **around 10 digits** → it's likely **seconds**
- If the value is **around 13 digits** → it's likely **milliseconds**

##### 2) Convert a Sample Value and Check Output

In [0]:
from pyspark.sql.functions import from_unixtime, col

data_milli_seconds = [(1633072800000,)]
data_seconds = [(1633072800,)]

df_milli_seconds = spark.createDataFrame(data_milli_seconds, ["timestamp_milli_seconds"])
df_seconds = spark.createDataFrame(data_seconds, ["timestamp_seconds"])

# Convert both as if they were in seconds
df_timestamp_milli_seconds = df_milli_seconds.withColumn("as_milli_seconds", from_unixtime("timestamp_milli_seconds"))
display(df_timestamp_milli_seconds)

# Convert both as if they were in seconds
df_timestamp_seconds = df_seconds.withColumn("as_seconds", from_unixtime("timestamp_seconds"))
display(df_timestamp_seconds)

timestamp_milli_seconds,as_milli_seconds
1633072800000,+53720-01-07 13:20:00


timestamp_seconds,as_seconds
1633072800,2021-10-01 07:20:00



|  timestamp_raw  |       as_seconds         |                      |
|-----------------|--------------------------|----------------------|
|  1633072800000  |  +53720-01-07 13:20:00   |  => clearly invalid  |
|  1633072800     |  2021-10-01 07:20:00     |  => makes sense      |

##### 3) Divide by 1000 and Try Again

In [0]:
df_corrected_milli_seconds = df_milli_seconds.withColumn("timestamp_milli_sec", from_unixtime(col("timestamp_milli_seconds") / 1000))
df_corrected_seconds = df_seconds.withColumn("timestamp_sec", from_unixtime("timestamp_seconds"))
display(df_corrected_milli_seconds)
display(df_corrected_seconds)

timestamp_milli_seconds,timestamp_milli_sec
1633072800000,2021-10-01 07:20:00


timestamp_seconds,timestamp_sec
1633072800,2021-10-01 07:20:00


##### 4) Automatic safe conversion pattern

In [0]:
from pyspark.sql.functions import length, col, when, from_unixtime

# Example input data
data = [
    (1633072800000,),   # milliseconds
    (1622476800,),      # seconds
    (1609459200000,),   # milliseconds
    (1633068900,),      # seconds
    (1989456700000,),   # milliseconds
    (1622599900,),      # seconds
    (1689499800000,)    # milliseconds
]
df = spark.createDataFrame(data, ["unix_time_ms"])

# Column with detection & conversion
df_conv = df.withColumn(
    "ts_seconds",
    when(length(col("unix_time_ms")) >= 13, (col("unix_time_ms") / 1000).cast("long"))
    .otherwise(col("unix_time_ms").cast("long"))
).withColumn("ts_readable", from_unixtime("ts_seconds"))

display(df_conv)

unix_time_ms,ts_seconds,ts_readable
1633072800000,1633072800,2021-10-01 07:20:00
1622476800,1622476800,2021-05-31 16:00:00
1609459200000,1609459200,2021-01-01 00:00:00
1633068900,1633068900,2021-10-01 06:15:00
1989456700000,1989456700,2033-01-16 02:51:40
1622599900,1622599900,2021-06-02 02:11:40
1689499800000,1689499800,2023-07-16 09:30:00


✅ Summary:

| What to Do                     | What It Tells You                                   |
| ------------------------------ | --------------------------------------------------- |
| Check if value has \~13 digits | Likely milliseconds                                 |
| Check if value has \~10 digits | Likely seconds                                      |
| Use `from_unixtime` and verify | If date looks wrong, likely wrong unit              |
| Try dividing by 1000           | Converts ms → sec; check if date looks correct then |