## This code explains 
1. how to destring all string variables including numeric values;
2. how to find non-numeric values of variables that look like numeric (e.g., X in 0,1,1,0,X,0,1);
3. how to destring a specific variable

## Useful resources:
1. https://stats.idre.ucla.edu/stata/faq/how-can-i-quickly-convert-many-string-variables-to-numericvariables/

In [1]:
import stata_setup
stata_setup.config("C:/Program Files/Stata17", "mp")


  ___  ____  ____  ____  ____ ©
 /__    /   ____/   /   ____/      17.0
___/   /   /___/   /   /___/       MP—Parallel Edition

 Statistics and Data Science       Copyright 1985-2021 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-STATA-PC        https://www.stata.com
                                   979-696-4600        stata@stata.com

Stata license: Unlimited-user 2-core network, expiring 25 May 2022
Serial number: 501709318376
  Licensed to: Jaeyoon Yu
               Erasmus University

Notes:
      1. Unicode is supported; see help unicode_advice.
      2. More than 2 billion observations are allowed; see help obs_advice.
      3. Maximum number of variables is set to 5,000; see help set_maxvar.


In [2]:
%%stata
use https://stats.idre.ucla.edu/stat/stata/faq/hsbs, clear
list _all


. use https://stats.idre.ucla.edu/stat/stata/faq/hsbs, clear

. list _all

     +-----------------------------------------------+
     |  id   gender   race   schtyp   read   science |
     |-----------------------------------------------|
  1. |  70        m      1      pub     45        47 |
  2. | 121        f      1      pub     68        63 |
  3. |  86        m      1      pub     44        58 |
  4. | 141        m      1      pub     63        53 |
  5. | 172        m      1      pub     47        53 |
     |-----------------------------------------------|
  6. | 113        m      1      pub     44        63 |
  7. |  50        m      3      pub     50        53 |
  8. |  11        m      2      pub     34        39 |
  9. |  84        m      1      pub     63         . |
 10. |  48        m      3      pub     57        50 |
     |-----------------------------------------------|
 11. |  75        m      1      pub     60        53 |
 12. |  60        m      X      pub     57  

In [3]:
# describe data
%stata describe id race read science


Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
id              str3    %3s                   
race            str5    %9s                   
read            str2    %5s                   
science         str2    %5s                   


In [4]:
# destring all variables including numerics
%stata destring _all, replace

id: all characters numeric; replaced as int
gender: contains nonnumeric characters; no replace
race: contains nonnumeric characters; no replace
schtyp: contains nonnumeric characters; no replace
read: all characters numeric; replaced as byte
science: all characters numeric; replaced as byte
(2 missing values generated)


In [5]:
# describe again. "race" seems numeric but still string.
%stata describe id race read science


Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
id              int     %10.0g                
race            str5    %9s                   
read            byte    %10.0g                
science         byte    %10.0g                


In [6]:
# check where race includes non-numeric
%stata list _all if missing(real(race))


     +----------------------------------------------+
     | id   gender   race   schtyp   read   science |
     |----------------------------------------------|
 12. | 60        m      X      pub     57        63 |
     +----------------------------------------------+


In [7]:
# re-assign . to race if it is "X"
%stata replace race="." if race=="X"

(1 real change made)


In [8]:
# destring the specific variable, race
%stata destring race, replace

race: all characters numeric; replaced as byte
(1 missing value generated)


In [10]:
# describe again.
%stata describe id race read science


Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
id              int     %10.0g                
race            byte    %10.0g                
read            byte    %10.0g                
science         byte    %10.0g                
