# SELECT names

## Pattern Matching Strings
This tutorial uses the **LIKE** operator to check names. We will be using the SELECT command on the table world:

In [1]:
library(tidyverse)
library(DBI)
library(getPass)
drv <- switch(Sys.info()['sysname'],
             Windows="PostgreSQL Unicode(x64)",
             Darwin="/usr/local/lib/psqlodbcw.so",
             Linux="PostgreSQL")
con <- dbConnect(
  odbc::odbc(),
  driver = drv,
  Server = "localhost",
  Database = "sqlzoo",
  UID = "postgres",
  PWD = getPass("Password?"),
  Port = 5432
)
options(repr.matrix.max.rows=20)

-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.3.0 --

[32mv[39m [34mggplot2[39m 3.3.0     [32mv[39m [34mpurrr  [39m 0.3.3
[32mv[39m [34mtibble [39m 3.0.0     [32mv[39m [34mdplyr  [39m 0.8.5
[32mv[39m [34mtidyr  [39m 1.0.2     [32mv[39m [34mstringr[39m 1.4.0
[32mv[39m [34mreadr  [39m 1.3.1     [32mv[39m [34mforcats[39m 0.5.0

-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



Password? ·········


## 1.

You can use `WHERE name LIKE 'B%'` to find the countries that start with "B".

The % is a _wild-card_ it can match any characters

**Find the country that start with Y**

In [2]:
world <- dbReadTable(con, 'world')

In [3]:
world %>% 
    filter(str_starts(name, '[Yy]')) %>% 
    select(name)

name
<chr>
Yemen


## 2.

**Find the countries that end with y**

In [4]:
world %>% 
    filter(str_ends(name, '[Yy]')) %>% 
    select(name)

name
<chr>
Germany
Hungary
Italy
Norway
Paraguay
Turkey
Uruguay
Vatican City


## 3.

Luxembourg has an **x** - so does one other country. List them both.

**Find the countries that contain the letter x**

In [5]:
world %>% 
    filter(str_detect(name, '[Xx]')) %>%
    select(name)

name
<chr>
Luxembourg
Mexico


## 4.

Iceland, Switzerland end with **land** - but are there others?

**Find the countries that end with land**

In [6]:
world %>% 
    filter(str_ends(name, 'land')) %>% 
    select(name)

name
<chr>
Finland
Iceland
Ireland
New Zealand
Poland
Swaziland
Switzerland
Thailand


## 5.

Columbia starts with a **C** and ends with **ia** - there are two more like this.

**Find the countries that start with C and end with ia**

In [7]:
world %>% 
    filter(str_detect(name, '^[Cc].*ia$')) %>%
    select(name)

name
<chr>
Cambodia
Colombia
Croatia


## 6.
Greece has a double **e** - who has **a** double **o**?

**Find the country that has oo in the name**

In [8]:
world %>% 
    filter(str_detect(name, 'oo')) %>% 
    select(name)

name
<chr>
Cameroon


## 7.

Bahamas has three **a** - who else?

**Find the countries that have three or more a in the name**

In [9]:
world %>% 
    filter(str_detect(name, 'a.*a.*a')) %>% 
    select(name)

name
<chr>
Antigua and Barbuda
Bahamas
Bosnia and Herzegovina
Canada
Equatorial Guinea
Guatemala
Jamaica
Kazakhstan
Madagascar
Malaysia


## 8.

India and Angola have an **n** as the second character. You can use the underscore as a single character wildcard.

```sql
SELECT name FROM world
 WHERE name LIKE '_n%'
ORDER BY name
```

**Find the countries that have "t" as the second character.**

In [10]:
world %>% 
    filter(str_detect(name, '^.{1}t')) %>%
    select(name)

name
<chr>
Ethiopia
Italy


## 9.

Lesotho and Moldova both have two o characters separated by two other characters.

**Find the countries that have two "o" characters separated by two others.**

In [11]:
world %>%
    filter(str_detect(name, 'o.{2}o')) %>% 
    select(name)

name
<chr>
"Congo, Democratic Republic of"
"Congo, Republic of"
Lesotho
Moldova
Mongolia
Morocco
Sao Tomé and Príncipe


## 10.

Cuba and Togo have four characters names.

**Find the countries that have exactly four characters.**

In [12]:
world %>% 
    filter(str_detect(name, '^.{4}$')) %>%
    select(name)

name
<chr>
Chad
Cuba
Fiji
Iran
Iraq
Laos
Mali
Oman
Peru
Togo


## 11.

The capital of **Luxembourg** is **Luxembourg**. Show all the countries where the capital is the same as the name of the country

**Find the country where the name is the capital city.**

In [13]:
world %>% 
    filter(name==capital) %>% 
    select(name)

name
<chr>
Djibouti
Luxembourg
San Marino
Singapore


## 12.

The capital of **Mexico** is **Mexico City**. Show all the countries where the capital has the country together with the word "City".

**Find the country where the capital is the country plus "City".**

> _The concat function_    
> The function concat is short for concatenate - you can use it to combine two or more strings.

In [14]:
world %>% 
    filter(capital==paste(name, 'City')) %>% 
    select(name)

name
<chr>
Guatemala
Kuwait
Mexico
Panama


## 13.

**Find the capital and the name where the capital includes the name of the country.**

In [15]:
world %>%
    filter(str_detect(capital, name)) %>% 
    select(capital, name)

capital,name
<chr>,<chr>
Andorra la Vella,Andorra
Djibouti,Djibouti
Guatemala City,Guatemala
Kuwait City,Kuwait
Luxembourg,Luxembourg
Mexico City,Mexico
Monaco-Ville,Monaco
Panama City,Panama
San Marino,San Marino
Singapore,Singapore


## 14.

**Find the capital and the name where the capital is an extension of name of the country.**

You _should_ include **Mexico City** as it is longer than **Mexico**. You _should not_ include **Luxembourg** as the capital is the same as the country.

In [16]:
world %>% 
    filter(str_detect(capital, name) & capital != name) %>% 
    select(capital, name)

capital,name
<chr>,<chr>
Andorra la Vella,Andorra
Guatemala City,Guatemala
Kuwait City,Kuwait
Mexico City,Mexico
Monaco-Ville,Monaco
Panama City,Panama


## 15.

For **Monaco-Ville** the name is **Monaco** and the extension is **-Ville**.

**Show the name and the extension where the capital is an extension of name of the country.**

You can use the SQL function [REPLACE](https://sqlzoo.net/wiki/REPLACE).

In [17]:
world %>% 
    filter(str_detect(capital, paste('^', name, '.+$', sep=''))) %>% 
    mutate(extension=str_replace(capital, name, '')) %>% 
    select(name, extension)

name,extension
<chr>,<chr>
Andorra,la Vella
Guatemala,City
Kuwait,City
Mexico,City
Monaco,-Ville
Panama,City


In [18]:
dbDisconnect(con)