# HTML & CSS

One essential part of the data collection and management module will be to learn how to get data from various sources using various techniques. One of these techniques, wich will be the point of focus of this lecture is web scraping, which means getting data from web pages. In order to get data from web pages in an automated manner, we'll have to dig into what web pages are made of, which is code, and learn how to understand and use the structure of that code!

## What you will learn during this course 🧐🧐

As a Data Scientist, you will sometimes be required to code web applications in which you will integrate your Machine Learning algorithms, and you will often have to get data from the web. That is why it is good to have a solid knowledge in web development. In this course we'll cover:

* How a website is structured
* How to create/read HTML tags
* What is CSS
* How to use CSS selectors
* The difference between inline/embedded & style sheet styling
* How to view and use the code from web pages


## Reminder: Tools you'll need ℹ️ℹ️

When it comes to web development, you will need:

* A code editor,
* A web browser.

We advise you to use <a href="https://atom.io/" target="_blank">Atom</a> or <a href="https://code.visualstudio.com/" target="_blank">VSCode</a> as code editor. (Reminder: a code editor is simply an improved text editor that is able to recognise and highlight key elements in your code to make it more readable, but may also include more advanced features such as connecting to a runtime to execute your code, connect to github or other services)

Regarding web browsers, we advise you to use <a href="https://www.google.com/chrome/" target="_blank">Google Chrome</a>.

## HTML ⭐⭐

Think of HTML as the skeleton of a website. You will structure your site using **HTML tags**. Each tag can be seen as a section of your website, like navigation, with links inside it. As Russian dolls you can nest each tags within each other and therefore create more and more complex websites.

HTML files are read by web browsers (Chrome, Firefox to name a few) that will interpret the code and display a nice web page! 

### Structure of an HTML page 🏗️🏗️

Let's start by creating an HTML file. Go to your code editor and manually create a new file called `index.html`

You just created your HTML file! You can now open it with your text editor and start writing your first HTML code:

```html
<!DOCTYPE html>
<html>
  <head>
    <title>Page title</title>
  </head>
  <body>
    Your feed
  </body>
</html>
```

As you can see, HTML elements are structured with tags. For example, all your content is contained within the `<html>` element. 

Each element has an opening and closing tag:

```html
<your_tag_name>something</your_tag_name>
```

It is very likely that you'll have nested elements within your page. In the above case, you have `<head></head>` tags that are nested within `<html>`. This is actually the way of structuring your page.

### Create a title 🏆

To create a title, simply write the following tags:

```html
<h1>Title</h1>
<h2>Smaller title</h2>
<h3>More smaller title</h3>
...
<h6>The smaller title</h6>
```

You can't go beyond `<h6>`, which are the smallest titles in HTML.

### Paragraphs 📝

If you need to add text, you can enclose it within a `p` element.

```html
<p>Paragraph</p>
```

### Dividers 🗂️

A website is organized in sections. A smart way to separate these sections is with a `div` element. You will be able to put all the content related to this section inside it:

```html
<div>
  <h1>This is a section</h1>
  <p>We can put everything we want inside.</p>
</div>
```


### Lists ✅

If you need to create lists in HTML, you will use two elements:

* `ul` for _unordered list_,
* `li` for _list item_.

The first element will **generate a list** while the second one will **create the actual list item**:

```html
<!DOCTYPE html>
<html>
  <head>
    <title>Products</title>
  </head>
  <body>
    <h1>My favorite food</h1>
    <ul>
      <li>Burrito</li>
      <li>Dumplings</li>
      <li>Spaghetti</li>
    </ul>
  </body>
</html>
```

If you want to display an _ordered list_ instead substitute `ul` by `ol`!

### Images 🖼️

We can add images via `img` element. You will need to add parameters within the opening tag such as the source, i.e. the location of the image.

```html
<img src="http://voyagerloin.com/vl-content/2014/04/natural-world-finalists-2.jpg" />
```

> NB: Note that we don't use `</img>` closing tag. The reason why is that you can't have elements inside an image. That's why you can simply use one opening tag.


### Create a navigation bar 🖱️

When you go to a website, it is common to have a menu to help you navigate. You can build a navigation bar the following way:

```html
<nav>
  <a href="index.html">Home</a>
  <a href="pages/about.html">About</a>
  <a href="pages/products.html">Products</a>
  <a href="pages/contact_us.html">Contact Us</a>
</nav>
```

Can you figure out the meaning of each HTML tag?

`<nav></nav>` represents a section dedicated to navigation and `<a href="...">Link</a>` a link to another page or an external resource such as website, file, etc.


### Create a form

One last useful element in HTML: `form`. You can create them by using the `<form></form>` and `<input/>` tags. Let's create a sample form:

```html
<form>
  <div>
    <label>Email</label> <input required="required" type="email"/>
  </div>
  <div>
    <label>Image</label> <input required="false" type="file"/>
  </div>
  <div>
    <label>Compress Photo?</label>
    <input type="checkbox" value="1"/>
  </div>
  <br>
  <div>
    <label>Category</label>
    <select>
      <option value="1">Compliment</option>
      <option value="2">plainte</option>
      <option value="3">Other</option>
    </select>
  </div>
  <hr>
  <div>
    <input type="submit" value="Submit"/>
  </div>
</form>
```

`<input>` tag is used to create different types of interaction buttons. You have the main ones on the example above but you can definitely check more here 👉👉 <a href="https://www.w3schools.com/tags/tag_input.asp" target="_blank">HTML `<input>` Tag</a>

Also, `<label></label>` tags let you define a description for each `<input>` tags. This will help your user fill out the form.

In any case, remember that your form ⚠️ **MUST be enclose within `<form></form>` tags** ⚠️. 

Don't forget to include `<input type="submit">`. This will be the button that will send data filled by the user to a given server.

## CSS 🎨🎨

CSS stands for `Cascading Style Sheet`. It is a bit like the painter of your website who will make it look beautiful.

### How to use CSS 🤔

CSS can be used by selecting an element in your HTML and applying some style to it. You can:

* Select a tag directly (`html`, `body`, `span`)
* Give a `class` to your tags and then use it for styling
* Give an `id` to your tags and then use it for styling

#### Select an element with its tag name 🤏

You can directly select an element by its name:

```html
html {
  background-color: blue;
}
```

Here we have given the HTML element a background color that is blue. The main advantage of using this method is that it the above styling will be applied to all element name.

#### Select an element using classes 🤵

You can give your element a class name that you will insert inside an opening tag:

```html
<span class="class_name">hello</span>
```

Here we have just created a class called: `class_name`.

You can use this class name in several other tags if you want to:

```html
<div>
  <p><span class="class_name">Hello</span></p>
</div>

<div>
  <h2 class="class_name">I like cats</h2>
</div>
```

> NB: It is possible to give several class names to your element. All you have to do is separate them with space like so:

```html
<h1 class= "title_1 homepage">Hello</h1>
```

Here, we have given two class names to our H1 tag: `title_1` and `homepage`.

#### Select an element using ids #️⃣

Unlike a class, an id should be **unique** and therefore cannot be used in several tags at the same time. Apart from this, it works the same way as a class:

```html
<h2 class="homepage" id="banner_title">Hello World</h2>
```

Here, `h2` has an id `banner_title`.

### Three ways to add CSS to a website 🎀

CSS can be added using three methods. Each has advantages and flaws.

#### Inline Styling

Inside a tag, you can directly add your style elements. It works like this:

```html
<balise style="attribut: value;">Hello world!</balise>
```

Or, for example:

```html
<span style="color: #fff; text-size:10px">Hello world!</span>
```

#### Embedded Styling

It is possible to add style elements directly in your HTML page in the `<head></head>` tags.

This is very handy if you have little style to add to your page but need to add several properties to an element.

```html
<head>
  <style type="text/css">
    selector {
      propriety: value;
      propriety: value;
      propriety: value;
    }
  </style>
</head>
```

#### Select by class

You can select a class by adding a `.` before the class name. For example:

```html
<head>
  <style type="text/css">
    .menu-item {
      background-color: #00dbd0;
      border-bottom: 1px solid grey;
      min-height: 5%;
    }
  </style>
</head>
```

##### Select by id

To select by id, you have to replace `.` by `#`.

```html
<head>
  <style type="text/css">
    #profile_pic {
      border-radius: 50px;
      border: 1px solid grey;
      width: 20px;
    }
  </style>
</head>
```

#### Style Sheet

If you have a lot of styling to do on your web page, we advise you to put it directly in another file (named `style.css` for example). It will only contain CSS code and be linked it to your HTML file.

To link it to your HTML code use:

```html
<head>
  <link rel="stylesheet" href="style.css">
</head>
```

It just tells that the browser have to look into `style.css` in order to add styles to the current web page.

## How to view and use the code from web pages

Now we have learned a little bit about how to write and structure our own webpages from scratch, however the goal of this lecture is not to turn you into web developpers nor web designers but into expert scrapers! The most important aspect of scraping is being able to dig out, see, and understand the code hidden behind the webpage!

### Inspect the webpage

Let's start by understanding how to view the source code of a webpage. Now, before we start take into consideration the fact that we'll use the Chrome navigator for the demo and that the exact steps of demo might vary slightly depending on the web browser you are using.

1. First you need to open a web browser and go to a webpage like [this one](https://www.jedha.co/), you should see this:

![step1](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/0-M04-data-collection/jedha_website.PNG)

2. Then you can right-click anywhere on the page:

![step2](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/0-M04-data-collection/jedha_website_right_click.PNG)

3. And click inspect. You should then see the page split in two, with the browser view on the one side and the source code on the other side:

![step3](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/0-M04-data-collection/jedha_website_inspect.PNG)

4. The source code represents all the html code that the page is composed of, and that is interpreted by the browser in order to display a beautiful webpage! Now it can be difficult to pin point exactly what part of the source code corresponds to what part of the webpage. In order to more easily find the source code associated with the webpage you can use the following button that lets you inspect a specific element of the webpage by clicking on it:

![step4](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/0-M04-data-collection/jedha_website_select_element_inspect.PNG)

5. As you mouseover on various elements of the webpage here's what happens:

![step5](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/0-M04-data-collection/jedha_website_select_element_inspect_mouseover.PNG)

6. You may then click in to automatically scroll to the corresponding location in the source code:

![step6](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/0-M04-data-collection/jedha_website_select_element_inspect_selected.PNG)

7. If you wanted to indicate the position of this piece of source code in the webpage, you could list the lineage of parent tags containing it:

![step7](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/0-M04-data-collection/jedha_website_select_element_inspect_selected_parent_tag.PNG)

8. Or you could use the `Copy XPath` option by right clicking the source code:

![step8](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/0-M04-data-collection/jedha_website_select_element_inspect_selected_XPath.PNG)

The XPath represents the exact address of the element in the source code, for example the image element we selected on Jedha's landing page is: `/html/body/div[2]/div/div[2]/div[1]/img`.

We'll use XPath a lot in what follows because they're an easy, straight forward way to pin point elements in the source without having to decorticate the whole source code. Keep this whole process in mind, you will use it A LOT!

## Resources 📚📚

- <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element" target="_blank">Every HTML tags</a>
- <a href="https://developer.mozilla.org/fr/docs/Web/CSS/Reference" target="_blank">Every CSS selectors</a>