### 🔐 **TLS/JA3 Fingerprinting: How It Works?**
**JA3 Fingerprinting** is a technique used to identify clients based on how they establish a **TLS (Transport Layer Security)** handshake.

### What is TLS?
TLS is the encryption protocol used for securing data transfer between a client (like a web browser) and a server (like a website).

### How JA3 Works:
1. **Client Hello Message:** When a browser connects to a website using HTTPS, it sends a "Client Hello" message.
2. **TLS Fingerprint:** This message includes:
   - Supported TLS versions
   - Cipher suites (encryption algorithms)
   - Compression methods
   - Elliptic curves and extensions
3. **JA3 Hash Generation:** These parameters are hashed into a unique string called a **JA3 fingerprint**.
4. **Recognition:** Servers can compare the JA3 hash against known fingerprints to identify the client type (like Chrome, Firefox, Python scripts, etc.).

### 🌐 **HTTP/2 Fingerprinting:**
HTTP/2 is the modern version of the HTTP protocol used for faster web page loading. HTTP/2 fingerprinting works by identifying how clients implement HTTP/2 features.

In [10]:
""" 
Objective: Checking JA3 fingerprint
"""
# TODO: Import requests
# TODO: Import requests from curl_cffi and give it alias to differentiate from requests
# TODO: Try to send request to https://tls.browserleaks.com/json using both library
# TODO: Print the responses and compare them
# %pip install requests curl_cffi
import requests  # Import requests biasa
from curl_cffi import requests as curl_requests  # Import curl_cffi.requests dan beri alias

# URL untuk cek JA3 fingerprint
url = "https://tls.browserleaks.com/json"

print("=== Using requests ===")
try:
    response1 = requests.get(url)
    print(response1.json())
except Exception as e:
    print(f"Error with requests: {e}")

print("\n=== Using curl_cffi.requests ===")
try:
    response2 = curl_requests.get(url, impersonate="chrome101")  # Meniru Chrome
    print(response2.json())
except Exception as e:
    print(f"Error with curl_cffi: {e}")


=== Using requests ===
Error with requests: HTTPSConnectionPool(host='tls.browserleaks.com', port=443): Max retries exceeded with url: /json (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1000)')))

=== Using curl_cffi.requests ===
Error with curl_cffi: Failed to perform, curl: (35) Recv failure: Connection reset by peer. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.


In [16]:
""" 
Objective: Impersonating real browser
"""
# TODO: Use curl_cffi to send request to https://tls.browserleaks.com/json with and without impersonation
# TODO: Compare the responses and share your thoughts

from curl_cffi import requests
import json

url = "https://tls.browserleaks.com/json"

# Request WITH impersonation (Chrome 120)
print("Request WITH impersonation (Chrome 120):\n")
resp_impersonated = requests.get(url, impersonate="chrome120")
print(json.dumps(resp_impersonated.json(), indent=2))

print("\n" + "="*80 + "\n")

# Request WITHOUT impersonation
print("Request WITHOUT impersonation:\n")
resp_plain = requests.get(url)
print(json.dumps(resp_plain.json(), indent=2))


Request WITH impersonation (Chrome 120):



SSLError: Failed to perform, curl: (35) Recv failure: Connection reset by peer. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.

In [18]:
""" 
Objective: Web scraping Adidas
"""
# TODO: Scrape https://www.adidas.de/api/product-list/IE4042 and analyze the API result
# TODO: Try using requests library, if you failed, try bypass it by using custom headers
# TODO: If you succeeded, print the response and share your methods

import requests
from fake_useragent import UserAgent

# Create a UserAgent object
ua = UserAgent()
# Generate a random user agent
user_agent = ua.random

# Create a headers dictionary with the random user agent
headers = {
    "User-Agent": user_agent,
    "Accept": "application/json",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Connection": "keep-alive",
    "DNT": "1",
    "Upgrade-Insecure-Requests": "1",
}

# Define the URL to scrape

url = "https://www.adidas.de/api/product-list/IE4042"

# Define the headers to impersonate a real browser
# Send a GET request using requests with custom headers
# response = requests.get(url, timeout=10)
response = requests.get(url, headers=headers)
# Check if the request was successful
if response.status_code == 200:
    # Print the response content
    print("Response from Adidas API:")
    print(response.headers.get('Content-Type'))
    print(response.text[:200])

else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")
    print("Response content:")
    print(response.text)



Response from Adidas API:
application/json; charset=utf-8
�������N+rS)$XS�m�5�r��qp@�RDD
 ��[�9S���6���k�PO�qiBbV��9Bb�}��=g��>k2��?��IS�d�F���H�?e��D#4L�R��w/ `!t��^{�FC�V��HF�� � $�]����Q�c��>�ֳ
��������/� +���~��r^�r��yX��H�Bh߀��%


In [None]:
""" 
Objective: Web scraping Adidas
"""
# TODO: Visit https://www.adidas.com/us/
# TODO: Try to scrape one product
# TODO: Share your web scraping methods
# TODO: Push the github link here for grading
"""
i use selenium to scrape the product from adidas website
https://www.adidas.com/us/ i was try to scrape the product using requests but the response is empty
so i use selenium to scrape the product from the website
the response is empty because the website is using javascript to load the product 
so i use selenium to scrape the product from the website

github repo
https://github.com/khilmi-aminudin/course_assignment_project/tree/adidas_scrape
"""

In [None]:
""" 
Objective: Understanding JavaScript Challenge vs JavaScript Rendering
"""
# TODO: Send a request to https://bookoutlet.com
# TODO: Try to use default requests to send the request using custom headers
# TODO: Try to use curl_cffi by using impersonation
# TODO: Which one works? If not, try to use requests-html
# TODO: Whether its works or not, try to analyze the request method and share your thoughts

import requests
from curl_cffi import requests as curl_requests
from fake_useragent import UserAgent

# Create a UserAgent object
ua = UserAgent()
# Generate a random user agent
user_agent = ua.random
# Create a headers dictionary with the random user agent
headers = {
    "User-Agent": user_agent,
    "Accept": "application/json",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Connection": "keep-alive",
    "DNT": "1",
    "Upgrade-Insecure-Requests": "1",
}

# URL to scrape
url = "https://bookoutlet.com"

# Send a GET request using requests with custom headers
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code == 200:
    # Print the response content
    print("Response from BookOutlet API:")
    print(response.headers.get('Content-Type'))
    print(response.text[:200])
else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")
    print("Response content:")
    print(response.text)

# Send a GET request using curl_cffi with impersonation
response = curl_requests.get(url, headers=headers, impersonate="chrome120")

# Check if the request was successful
if response.status_code == 200:
    # Print the response content
    print("Response from BookOutlet API:")
    print(response.headers.get('Content-Type'))
    print(response.text[:200])
else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")
    print("Response content:")
    print(response.text)

Failed to retrieve data. Status code: 403
Response content:
���  ���z��Wq�w�C�׳�T�H��8��������`����%6ƾ���5:�����\�bA���j *�V���|�V�F����|j�ɫf~�%��n����ɿ�b��_�<N��Dy�  T�H����"������0&��t`h�ͻFJ�7g��Y�A�|�駩�����v]=����
\7�A!�T3�B���f�Ȏ�'�BW��e���joJ��j?O]5;��#C��j;z�l5k,�(e�i�Jَ���O��g���e�Ub��V�(	<0t<�Ѱ��VJ�YR�TЕaɚjG�ʪ��r�R��v�6�AQ�]�M݇��K2QpҒ9���"�?�n�NVЎ�T��ɉ��n8qMW�(�nko��C5����=�:پ�:��!��������@pɿ��=3����h�v$�m���!iN�<u_M�YU�'Ic�8�I:}B 6$͟�Y���/�Oy��������7�����D�����a̅ǳ��&` �e�#�'P���h���A�d����nJk��&Ȓ�y����)�d��?�%�{k�0�0D
�;�I�t�6)�����M))Hnꤔ.$��{SwRJ�-pV�
U,q뫘΀�ng��R�R��m`g�RV�f��ʺxdi'�
�o������ˑ�y�}�{���'fu����"h���v��90[�<Fy�$�q0T?~*?�3�0�E�dL��t
�:]kS �F.��9���a�3tA�Ib$x� � �-Ea���ul�<j0�� ��G10r��W��e�i�B����( ��pj�-�����PY��<����|?J�(����Fr��&roj1�� 2:#פ��rq0�@�2Q׼}D��'����kϴD���h*���#2����N"1�%�%	.��G$~�����c�r�0�y�
>��N��v����Vv0��[x��yy��P� ͦ��R��&I��"e��'��{�>`�� [b��:&��N�,o�

In [None]:
""" 
Objective: Bypass JavaScript Challenge
"""
# TODO: Send a request to https://bookoutlet.com
# TODO: Use selenium to bypass the challenge
# TODO: Use hrequests to bypass the challenge or others library
# TODO: Compare those methods and share your thoughts
# TODO: Explore hrequests by yourself!
from selenium import webdriver
import time

options = webdriver.ChromeOptions()
options.add_argument('--headless')  # Optional: remove to see the browser
options.add_argument('--disable-blink-features=AutomationControlled')

driver = webdriver.Chrome(options=options)

driver.get("https://bookoutlet.com")
time.sleep(5)  # Let JS challenge run

print(driver.title)
print(driver.current_url)

driver.quit()

import nest_asyncio
nest_asyncio.apply()

import hrequests

response = hrequests.get("https://bookoutlet.com")
print(response.status_code)
print(response.url)
print(response.text[:500])


Just a moment...
https://bookoutlet.com/
zsh:1: no matches found: hrequests[all]
Note: you may need to restart the kernel to use updated packages.


200
https://bookoutlet.com
<!DOCTYPE html><html lang="en"><head><meta charSet="utf-8"/><meta name="google-site-verification" content="hwMYfD5FG7FRfi2tYTH3pjKyGldOafjhBgIYOZAgpr8"/><meta name="viewport" content="minimum-scale=1, initial-scale=1, width=device-width"/><title>Browse Discounted Books Online - Book Outlet</title><meta name="description" content="Save 50% off list prices on your next favourite read. Shop and enjoy Book Outlet&#x27;s wide range of kids, teens and adult books delivered straight to your doorstep."/


### **Reflection**
What do you know about TLS fingerprints? Write on your own words.

(answer here)

### **Exploration**
Explore another python library design for web scraping anti-bot like Botasaurus and Camoufox